[Mondrian] Mondrian generating SQL that cannot be indexed

Luc Boudreau lucboudreau at gmail.com
Mon Jul 21 16:40:40 EDT 2014


BTW. My branch has one test failure, because there is an optimization check
when the slicer contains a range which triggers a full level scan. All it
needs is the distinct number of values to evaluate if the slicer can be
optimized, but I haven't found a way to get it via some other mean.

Luc


On Mon, Jul 21, 2014 at 4:38 PM, Luc Boudreau <lucboudreau at gmail.com> wrote:

> I've actually been working on a branch on my weekends to address ranges.
>
> It can be found here: https://github.com/pentaho/mondrian/tree/smr
>
> These changes would help quite a bit. Not sure about your particular
> query, since I believe this is a segment query, and I have not found a way
> to nativize ranges for these yet. You can grab it, build it and let us know
> if it helps.
>
> Luc
>
>
> On Mon, Jul 21, 2014 at 3:51 PM, Julian Hyde <julianhyde at gmail.com> wrote:
>
>> That’s pretty yucky SQL. I sympathize.
>>
>> Can you try refactoring the SQL to other queries that gives the same
>> answer?
>>
>> 1. Move `dim_rul_band`.`band` = 'RUL 39-30’, up as an AND clause,
>> because it is common in both branches of the OR.
>>
>> 2. Try removing the OR, and dealing with Jun and Jul in one clause:
>>
>>          (
>>             `dim_date`.`DAY_DATE`, `dim_rul_band`.`band`
>>          )
>>          in
>>          (
>>             (DATE '2014-06-01', 'RUL 39-30’),
>>
>> ...
>>
>>             (DATE '2014-07-08', 'RUL 39-30')
>>          )
>>
>>
>>
>> 3. Try ‘dim_date.DAY_DATE BETWEEN DATE ‘2014-06-01’ AND DATE
>> ‘2014-07-08’. (Mondrian is not currently capable of making that
>> optimization — see http://jira.pentaho.com/browse/MONDRIAN-1494 — but it
>> would be interesting to see the performance difference.)
>>
>> Post the performance numbers for the various queries and we will see
>> which one we’d like to generate. (In an ideal world… making Mondrian
>> generate those queries is another matter… I’m not promising anything…)
>>
>> Julian
>>
>>
>> On Jul 21, 2014, at 6:50 AM, Ricardo Fradinho <
>> ricardo.fradinho at webdetails.pt> wrote:
>>
>>  Hi,
>>
>> I have a performance problem caused by the queries that Mondrian
>> generates.
>> These queries always force full table scans because there's no indexing
>> that can be done do help these queries.
>>
>> The problem happens when I select my time range as (2014-06-01 :
>> 2014-07-08) and Mondrian rewrites this as (2014-06) + (2014-07-01 :
>> 2014-07-08) which is translated into this nasty SQL:
>>
>> select
>>   `dim_site`.`SITE` as `c0`,
>>   count(distinct `fact_tetrapak_500k`.`dim_function_key`) as `m0`
>> from
>>   `dim_site` as `dim_site`,
>>   `fact_tetrapak_500k` as `fact_tetrapak_500k`,
>>   `dim_date` as `dim_date`,
>>   `dim_rul_band` as `dim_rul_band`
>> where `fact_tetrapak_500k`.`dim_site_key` = `dim_site`.`dim_site_key`
>>  and `fact_tetrapak_500k`.`dateToProcess` = `dim_date`.`DATE_SK`
>>  and `fact_tetrapak_500k`.`dim_rul_band_key` =
>> `dim_rul_band`.`DIM_RUL_BAND_KEY`
>>  and
>>  (
>>    (
>>       `dim_date`.`MONTH_NAME` = 'June'
>>       and `dim_date`.`YEAR_NUMBER` = 2014
>>       and `dim_rul_band`.`band` = 'RUL 39-30'
>>    )
>>    or
>>    (
>>       (
>>          (
>>             `dim_date`.`DAY_DATE`, `dim_rul_band`.`band`
>>          )
>>          in
>>          (
>>             (DATE '2014-07-01', 'RUL 39-30'),
>>             (DATE '2014-07-02', 'RUL 39-30'),
>>             (DATE '2014-07-03', 'RUL 39-30'),
>>             (DATE '2014-07-04', 'RUL 39-30'),
>>             (DATE '2014-07-05', 'RUL 39-30'),
>>             (DATE '2014-07-06', 'RUL 39-30'),
>>             (DATE '2014-07-07', 'RUL 39-30'),
>>             (DATE '2014-07-08', 'RUL 39-30')
>>          )
>>       )
>>    )
>> )
>> group by `dim_site`.`SITE`
>>
>> There is no indexing (on MySQL) that can prevent a full table scan.
>> I implemented a workaround by changing my [Date] dimension to have only
>> the Day level, which is fine for the timerange selection we use. But this
>> voids the use of aggregation tables.
>>
>> I tried mondrian­.­rolap­.­aggregates­.­optimizePredicates=false (Boolean
>> property that determines whether Mondrian optimizes predicates­)
>> but as the docs say, "[...] If false­,­ Mondrian still optimizes queries
>> that involve all members of a dimension­".
>>
>> Also tried setting mondrian­.­rolap­.­EnableInMemoryRollup=false but
>> didn't help either.
>>
>> Do you know if there's any setting that can prevent this optimization ?
>>
>> Thanks,
>> Ricardo Fradinho.
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>>
>>
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20140721/69d5f467/attachment.html 


More information about the Mondrian mailing list