[Mondrian] Re-implementing native TopCount

Andrey Khayrutdinov andrey.khayrutdinov at gmail.com
Tue Sep 22 12:45:24 EDT 2015


Did anyone have a chance to take a look at the changes?


Regards,
Andrey Khayrutdinov

2015-09-10 22:09 GMT+04:00 Andrey Khayrutdinov <
andrey.khayrutdinov at gmail.com>:

> Julian,
>
> I will highly appreciate to get your feedback.
>
>
> King regards,
> Andrey Khayrutdinov
>
> 2015-09-08 23:23 GMT+04:00 Andrey Khayrutdinov <
> andrey.khayrutdinov at gmail.com>:
>
>> Dear all,
>>
>> I've picked up MONDRIAN-1272
>> <http://jira.pentaho.com/browse/MONDRIAN-1272>. It is about shrinking
>> tuples having no corresponding data in the fact table.
>> The approach to avoid such behaviour is to use OUTER JOIN.
>>
>> The general routine is the following:
>>
>> Let current SQL query be called SUBSELECT_Q. It is created by combining
>> different levels and then linking them with a constraint through the fact
>> table. Let's call these combined levels CROSSJOIN_Q.
>> So we can execute this query:
>>
>> SELECT
>> FROM SUBSELECT_Q
>> RIGHT JOIN CROSSJOIN_Q
>>
>>
>> And this is the general schema.
>> Level should be combined not by linking via the fact table (that will
>> change nothing), but with CROSS JOIN operator.
>>
>> As an example, here is an MDX query:
>>
>> SELECT [Measures].[Unit Sales] on columns,
>> TopCount(Crossjoin([Product].[Product Category].Members,
>> [Customers].[City].Members), 10, [Measures].[Unit Sales]) on rows
>> FROM [Sales]
>> WHERE [Time].[1997].[Q3]
>>
>> and here is the monstrous SQL query, produced via the schema above:
>>
>> select
>>           crossj.`cj0` as `c0`,
>>           crossj.`cj1` as `c1`,
>>           crossj.`cj2` as `c2`,
>>           crossj.`cj3` as `c3`,
>>           crossj.`cj4` as `c4`,
>>           crossj.`cj5` as `c5`,
>>           subsel.`c6` as `c6`
>> from (
>>           select
>>                    `product_class`.`product_family` as `c0`,
>>                    `product_class`.`product_department` as `c1`,
>>                    `product_class`.`product_category` as `c2`,
>>                    `customer`.`country` as `c3`,
>>                    `customer`.`state_province` as `c4`,
>>                    `customer`.`city` as `c5`,
>>                    sum(`sales_fact_1997`.`unit_sales`) as `c6`
>>           from
>>                    `product` as `product`,
>>                    `product_class` as `product_class`,
>>                    `sales_fact_1997` as `sales_fact_1997`,
>>                    `customer` as `customer`,
>>                    `time_by_day` as `time_by_day`
>>           where
>>                    `product`.`product_class_id` =
>> `product_class`.`product_class_id` and
>>                    `sales_fact_1997`.`product_id` =
>> `product`.`product_id` and
>>                    `sales_fact_1997`.`customer_id` =
>> `customer`.`customer_id` and
>>                    `sales_fact_1997`.`time_id` = `time_by_day`.`time_id`
>> and
>>                    `time_by_day`.`the_year` = 1997 and
>>                    `time_by_day`.`quarter` = 'Q3'
>>           group by
>>                    `product_class`.`product_family`,
>>                    `product_class`.`product_department`,
>>                    `product_class`.`product_category`,
>>                    `customer`.`country`,
>>                    `customer`.`state_province`,
>>                    `customer`.`city`
>>           order by
>>                    sum(`sales_fact_1997`.`unit_sales`) DESC,
>>                    ISNULL(`product_class`.`product_family`) ASC,
>> `product_class`.`product_family` ASC,
>>                    ISNULL(`product_class`.`product_department`) ASC,
>> `product_class`.`product_department` ASC,
>>                    ISNULL(`product_class`.`product_category`) ASC,
>> `product_class`.`product_category` ASC,
>>                    ISNULL(`customer`.`country`) ASC, `customer`.`country`
>> ASC,
>>                    ISNULL(`customer`.`state_province`) ASC,
>> `customer`.`state_province` ASC,
>>                    ISNULL(`customer`.`city`) ASC, `customer`.`city` ASC
>> ) as `subsel`
>> right join (
>>           select
>>                    `t0`.`product_family` as `cj0` ,
>>                    `t0`.`product_department` as `cj1` ,
>>                    `t0`.`product_category` as `cj2` ,
>>                    `t1`.`country` as `cj3` ,
>>                    `t1`.`state_province` as `cj4` ,
>>                    `t1`.`city` as `cj5`
>>           from (
>>                    select
>>                              `product_class`.`product_family` as
>> `product_family`,
>>                              `product_class`.`product_department` as
>> `product_department`,
>>                              `product_class`.`product_category` as
>> `product_category`
>>                    from
>>                              `product` as `product`,
>>                              `product_class` as `product_class`
>>                    where
>>                              `product`.`product_class_id` =
>> `product_class`.`product_class_id`
>>                    group by
>>                              `product_class`.`product_family`,
>>                              `product_class`.`product_department`,
>>                              `product_class`.`product_category`
>>                    order by
>>                              ISNULL(`product_class`.`product_family`)
>> ASC, `product_class`.`product_family` ASC,
>>                              ISNULL(`product_class`.`product_department`)
>> ASC, `product_class`.`product_department` ASC,
>>                              ISNULL(`product_class`.`product_category`)
>> ASC, `product_class`.`product_category` ASC
>>           ) as t0
>>           cross join (
>>                    select
>>                              `customer`.`country` as `country`,
>>                              `customer`.`state_province` as
>> `state_province`,
>>                              `customer`.`city` as `city`
>>                    from
>>                              `customer` as `customer`
>>                    group by
>>                              `customer`.`country`,
>>                              `customer`.`state_province`,
>>                              `customer`.`city`
>>                    order by
>>                              ISNULL(`customer`.`country`) ASC,
>> `customer`.`country` ASC,
>>                              ISNULL(`customer`.`state_province`) ASC,
>> `customer`.`state_province` ASC,
>>                              ISNULL(`customer`.`city`) ASC,
>> `customer`.`city` ASC
>>           ) as t1
>> ) as `crossj` on
>>           (`subsel`.`c0` = `crossj`.`cj0`) and
>>           (`subsel`.`c1` = `crossj`.`cj1`) and
>>           (`subsel`.`c2` = `crossj`.`cj2`) and
>>           (`subsel`.`c3` = `crossj`.`cj3`) and
>>           (`subsel`.`c4` = `crossj`.`cj4`) and
>>           (`subsel`.`c5` = `crossj`.`cj5`)
>> order by
>>           subsel.`c6` DESC,
>>           ISNULL(crossj.`cj0`) ASC, crossj.`cj0` ASC,
>>           ISNULL(crossj.`cj1`) ASC, crossj.`cj1` ASC,
>>           ISNULL(crossj.`cj2`) ASC, crossj.`cj2` ASC,
>>           ISNULL(crossj.`cj3`) ASC, crossj.`cj3` ASC,
>>           ISNULL(crossj.`cj4`) ASC, crossj.`cj4` ASC,
>>           ISNULL(crossj.`cj5`) ASC, crossj.`cj5` ASC
>>
>> Unfortunately, Mondrian does not have many tools for manipulating with
>> SQL, thus I made my own.
>> I have completed the POC, it passes almost all prepared tests and before
>> starting polishing it I asked Matt C if he had any objection against the
>> approach. He told me to ask the community, and here I am with this thread.
>>
>> My current work was pushed here:
>> https://github.com/akhayrutdinov/mondrian/tree/NativeTopCount.
>> It consists of two commits:
>>
>>    1. Tests for TopCount:
>>    https://github.com/akhayrutdinov/mondrian/commit/3963fb37a7da8284bb5ed1ddb79825c6bf7018cf
>>    2. My changes:
>>    https://github.com/akhayrutdinov/mondrian/commit/2436bfbd6cca007f0f3fd866b8e61e7f4c2adb5c
>>
>> I want to ask your opinion whether the approach is acceptable not to
>> spend time for improving the code that will be rejected.
>>
>> Best regards,
>> Andrey Khayrutdinov
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20150922/034a47e6/attachment-0001.html 


More information about the Mondrian mailing list