[Mondrian] Mondrian generating SQL that cannot be indexed

Mon Jul 21 15:51:03 EDT 2014

That’s pretty yucky SQL. I sympathize.

Can you try refactoring the SQL to other queries that gives the same answer? 

1. Move `dim_rul_band`.`band` = 'RUL 39-30’, up as an AND clause, because it is common in both branches of the OR.

2. Try removing the OR, and dealing with Jun and Jul in one clause:

>          (
>             `dim_date`.`DAY_DATE`, `dim_rul_band`.`band`
>          )
>          in
>          (
>             (DATE '2014-06-01', 'RUL 39-30’),
...
>             (DATE '2014-07-08', 'RUL 39-30')
>          )

3. Try ‘dim_date.DAY_DATE BETWEEN DATE ‘2014-06-01’ AND DATE ‘2014-07-08’. (Mondrian is not currently capable of making that optimization — see http://jira.pentaho.com/browse/MONDRIAN-1494 — but it would be interesting to see the performance difference.)

Post the performance numbers for the various queries and we will see which one we’d like to generate. (In an ideal world… making Mondrian generate those queries is another matter… I’m not promising anything…)

Julian

On Jul 21, 2014, at 6:50 AM, Ricardo Fradinho <ricardo.fradinho at webdetails.pt> wrote:

> Hi,
> 
> I have a performance problem caused by the queries that Mondrian generates.
> These queries always force full table scans because there's no indexing that can be done do help these queries.
> 
> The problem happens when I select my time range as (2014-06-01 : 2014-07-08) and Mondrian rewrites this as (2014-06) + (2014-07-01 : 2014-07-08) which is translated into this nasty SQL:
> 
> select
>   `dim_site`.`SITE` as `c0`,
>   count(distinct `fact_tetrapak_500k`.`dim_function_key`) as `m0`
> from
>   `dim_site` as `dim_site`,
>   `fact_tetrapak_500k` as `fact_tetrapak_500k`,
>   `dim_date` as `dim_date`,
>   `dim_rul_band` as `dim_rul_band`
> where `fact_tetrapak_500k`.`dim_site_key` = `dim_site`.`dim_site_key`
>  and `fact_tetrapak_500k`.`dateToProcess` = `dim_date`.`DATE_SK`
>  and `fact_tetrapak_500k`.`dim_rul_band_key` = `dim_rul_band`.`DIM_RUL_BAND_KEY`
>  and
>  (
>    (
>       `dim_date`.`MONTH_NAME` = 'June'
>       and `dim_date`.`YEAR_NUMBER` = 2014
>       and `dim_rul_band`.`band` = 'RUL 39-30'
>    )
>    or
>    (
>       (
>          (
>             `dim_date`.`DAY_DATE`, `dim_rul_band`.`band`
>          )
>          in
>          (
>             (DATE '2014-07-01', 'RUL 39-30'),
>             (DATE '2014-07-02', 'RUL 39-30'),
>             (DATE '2014-07-03', 'RUL 39-30'),
>             (DATE '2014-07-04', 'RUL 39-30'),
>             (DATE '2014-07-05', 'RUL 39-30'),
>             (DATE '2014-07-06', 'RUL 39-30'),
>             (DATE '2014-07-07', 'RUL 39-30'),
>             (DATE '2014-07-08', 'RUL 39-30')
>          )
>       )
>    )
> )
> group by `dim_site`.`SITE`
> 
> There is no indexing (on MySQL) that can prevent a full table scan.
> I implemented a workaround by changing my [Date] dimension to have only the Day level, which is fine for the timerange selection we use. But this voids the use of aggregation tables.
> 
> I tried mondrian.rolap.aggregates.optimizePredicates=false (Boolean property that determines whether Mondrian optimizes predicates)
> but as the docs say, "[...] If false, Mondrian still optimizes queries that involve all members of a dimension".
> 
> Also tried setting mondrian.rolap.EnableInMemoryRollup=false but didn't help either.
> 
> Do you know if there's any setting that can prevent this optimization ?
> 
> Thanks,
> Ricardo Fradinho.
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20140721/a79a5faf/attachment.html