[Mondrian] Re-implementing native TopCount

Andrey Khayrutdinov andrey.khayrutdinov at gmail.com
Thu Sep 10 14:09:50 EDT 2015


Julian,

I will highly appreciate to get your feedback.


King regards,
Andrey Khayrutdinov

2015-09-08 23:23 GMT+04:00 Andrey Khayrutdinov <
andrey.khayrutdinov at gmail.com>:

> Dear all,
>
> I've picked up MONDRIAN-1272
> <http://jira.pentaho.com/browse/MONDRIAN-1272>. It is about shrinking
> tuples having no corresponding data in the fact table.
> The approach to avoid such behaviour is to use OUTER JOIN.
>
> The general routine is the following:
>
> Let current SQL query be called SUBSELECT_Q. It is created by combining
> different levels and then linking them with a constraint through the fact
> table. Let's call these combined levels CROSSJOIN_Q.
> So we can execute this query:
>
> SELECT
> FROM SUBSELECT_Q
> RIGHT JOIN CROSSJOIN_Q
>
>
> And this is the general schema.
> Level should be combined not by linking via the fact table (that will
> change nothing), but with CROSS JOIN operator.
>
> As an example, here is an MDX query:
>
> SELECT [Measures].[Unit Sales] on columns,
> TopCount(Crossjoin([Product].[Product Category].Members,
> [Customers].[City].Members), 10, [Measures].[Unit Sales]) on rows
> FROM [Sales]
> WHERE [Time].[1997].[Q3]
>
> and here is the monstrous SQL query, produced via the schema above:
>
> select
>           crossj.`cj0` as `c0`,
>           crossj.`cj1` as `c1`,
>           crossj.`cj2` as `c2`,
>           crossj.`cj3` as `c3`,
>           crossj.`cj4` as `c4`,
>           crossj.`cj5` as `c5`,
>           subsel.`c6` as `c6`
> from (
>           select
>                    `product_class`.`product_family` as `c0`,
>                    `product_class`.`product_department` as `c1`,
>                    `product_class`.`product_category` as `c2`,
>                    `customer`.`country` as `c3`,
>                    `customer`.`state_province` as `c4`,
>                    `customer`.`city` as `c5`,
>                    sum(`sales_fact_1997`.`unit_sales`) as `c6`
>           from
>                    `product` as `product`,
>                    `product_class` as `product_class`,
>                    `sales_fact_1997` as `sales_fact_1997`,
>                    `customer` as `customer`,
>                    `time_by_day` as `time_by_day`
>           where
>                    `product`.`product_class_id` =
> `product_class`.`product_class_id` and
>                    `sales_fact_1997`.`product_id` = `product`.`product_id`
> and
>                    `sales_fact_1997`.`customer_id` =
> `customer`.`customer_id` and
>                    `sales_fact_1997`.`time_id` = `time_by_day`.`time_id`
> and
>                    `time_by_day`.`the_year` = 1997 and
>                    `time_by_day`.`quarter` = 'Q3'
>           group by
>                    `product_class`.`product_family`,
>                    `product_class`.`product_department`,
>                    `product_class`.`product_category`,
>                    `customer`.`country`,
>                    `customer`.`state_province`,
>                    `customer`.`city`
>           order by
>                    sum(`sales_fact_1997`.`unit_sales`) DESC,
>                    ISNULL(`product_class`.`product_family`) ASC,
> `product_class`.`product_family` ASC,
>                    ISNULL(`product_class`.`product_department`) ASC,
> `product_class`.`product_department` ASC,
>                    ISNULL(`product_class`.`product_category`) ASC,
> `product_class`.`product_category` ASC,
>                    ISNULL(`customer`.`country`) ASC, `customer`.`country`
> ASC,
>                    ISNULL(`customer`.`state_province`) ASC,
> `customer`.`state_province` ASC,
>                    ISNULL(`customer`.`city`) ASC, `customer`.`city` ASC
> ) as `subsel`
> right join (
>           select
>                    `t0`.`product_family` as `cj0` ,
>                    `t0`.`product_department` as `cj1` ,
>                    `t0`.`product_category` as `cj2` ,
>                    `t1`.`country` as `cj3` ,
>                    `t1`.`state_province` as `cj4` ,
>                    `t1`.`city` as `cj5`
>           from (
>                    select
>                              `product_class`.`product_family` as
> `product_family`,
>                              `product_class`.`product_department` as
> `product_department`,
>                              `product_class`.`product_category` as
> `product_category`
>                    from
>                              `product` as `product`,
>                              `product_class` as `product_class`
>                    where
>                              `product`.`product_class_id` =
> `product_class`.`product_class_id`
>                    group by
>                              `product_class`.`product_family`,
>                              `product_class`.`product_department`,
>                              `product_class`.`product_category`
>                    order by
>                              ISNULL(`product_class`.`product_family`) ASC,
> `product_class`.`product_family` ASC,
>                              ISNULL(`product_class`.`product_department`)
> ASC, `product_class`.`product_department` ASC,
>                              ISNULL(`product_class`.`product_category`)
> ASC, `product_class`.`product_category` ASC
>           ) as t0
>           cross join (
>                    select
>                              `customer`.`country` as `country`,
>                              `customer`.`state_province` as
> `state_province`,
>                              `customer`.`city` as `city`
>                    from
>                              `customer` as `customer`
>                    group by
>                              `customer`.`country`,
>                              `customer`.`state_province`,
>                              `customer`.`city`
>                    order by
>                              ISNULL(`customer`.`country`) ASC,
> `customer`.`country` ASC,
>                              ISNULL(`customer`.`state_province`) ASC,
> `customer`.`state_province` ASC,
>                              ISNULL(`customer`.`city`) ASC,
> `customer`.`city` ASC
>           ) as t1
> ) as `crossj` on
>           (`subsel`.`c0` = `crossj`.`cj0`) and
>           (`subsel`.`c1` = `crossj`.`cj1`) and
>           (`subsel`.`c2` = `crossj`.`cj2`) and
>           (`subsel`.`c3` = `crossj`.`cj3`) and
>           (`subsel`.`c4` = `crossj`.`cj4`) and
>           (`subsel`.`c5` = `crossj`.`cj5`)
> order by
>           subsel.`c6` DESC,
>           ISNULL(crossj.`cj0`) ASC, crossj.`cj0` ASC,
>           ISNULL(crossj.`cj1`) ASC, crossj.`cj1` ASC,
>           ISNULL(crossj.`cj2`) ASC, crossj.`cj2` ASC,
>           ISNULL(crossj.`cj3`) ASC, crossj.`cj3` ASC,
>           ISNULL(crossj.`cj4`) ASC, crossj.`cj4` ASC,
>           ISNULL(crossj.`cj5`) ASC, crossj.`cj5` ASC
>
> Unfortunately, Mondrian does not have many tools for manipulating with
> SQL, thus I made my own.
> I have completed the POC, it passes almost all prepared tests and before
> starting polishing it I asked Matt C if he had any objection against the
> approach. He told me to ask the community, and here I am with this thread.
>
> My current work was pushed here:
> https://github.com/akhayrutdinov/mondrian/tree/NativeTopCount.
> It consists of two commits:
>
>    1. Tests for TopCount:
>    https://github.com/akhayrutdinov/mondrian/commit/3963fb37a7da8284bb5ed1ddb79825c6bf7018cf
>    2. My changes:
>    https://github.com/akhayrutdinov/mondrian/commit/2436bfbd6cca007f0f3fd866b8e61e7f4c2adb5c
>
> I want to ask your opinion whether the approach is acceptable not to spend
> time for improving the code that will be rejected.
>
> Best regards,
> Andrey Khayrutdinov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20150910/1eb5596d/attachment-0001.html 


More information about the Mondrian mailing list