[Mondrian] Re-implementing native TopCount
Andrey Khayrutdinov
andrey.khayrutdinov at gmail.com
Tue Sep 22 12:45:24 EDT 2015
Did anyone have a chance to take a look at the changes?
Regards,
Andrey Khayrutdinov
2015-09-10 22:09 GMT+04:00 Andrey Khayrutdinov <
andrey.khayrutdinov at gmail.com>:
> Julian,
>
> I will highly appreciate to get your feedback.
>
>
> King regards,
> Andrey Khayrutdinov
>
> 2015-09-08 23:23 GMT+04:00 Andrey Khayrutdinov <
> andrey.khayrutdinov at gmail.com>:
>
>> Dear all,
>>
>> I've picked up MONDRIAN-1272
>> <http://jira.pentaho.com/browse/MONDRIAN-1272>. It is about shrinking
>> tuples having no corresponding data in the fact table.
>> The approach to avoid such behaviour is to use OUTER JOIN.
>>
>> The general routine is the following:
>>
>> Let current SQL query be called SUBSELECT_Q. It is created by combining
>> different levels and then linking them with a constraint through the fact
>> table. Let's call these combined levels CROSSJOIN_Q.
>> So we can execute this query:
>>
>> SELECT
>> FROM SUBSELECT_Q
>> RIGHT JOIN CROSSJOIN_Q
>>
>>
>> And this is the general schema.
>> Level should be combined not by linking via the fact table (that will
>> change nothing), but with CROSS JOIN operator.
>>
>> As an example, here is an MDX query:
>>
>> SELECT [Measures].[Unit Sales] on columns,
>> TopCount(Crossjoin([Product].[Product Category].Members,
>> [Customers].[City].Members), 10, [Measures].[Unit Sales]) on rows
>> FROM [Sales]
>> WHERE [Time].[1997].[Q3]
>>
>> and here is the monstrous SQL query, produced via the schema above:
>>
>> select
>> crossj.`cj0` as `c0`,
>> crossj.`cj1` as `c1`,
>> crossj.`cj2` as `c2`,
>> crossj.`cj3` as `c3`,
>> crossj.`cj4` as `c4`,
>> crossj.`cj5` as `c5`,
>> subsel.`c6` as `c6`
>> from (
>> select
>> `product_class`.`product_family` as `c0`,
>> `product_class`.`product_department` as `c1`,
>> `product_class`.`product_category` as `c2`,
>> `customer`.`country` as `c3`,
>> `customer`.`state_province` as `c4`,
>> `customer`.`city` as `c5`,
>> sum(`sales_fact_1997`.`unit_sales`) as `c6`
>> from
>> `product` as `product`,
>> `product_class` as `product_class`,
>> `sales_fact_1997` as `sales_fact_1997`,
>> `customer` as `customer`,
>> `time_by_day` as `time_by_day`
>> where
>> `product`.`product_class_id` =
>> `product_class`.`product_class_id` and
>> `sales_fact_1997`.`product_id` =
>> `product`.`product_id` and
>> `sales_fact_1997`.`customer_id` =
>> `customer`.`customer_id` and
>> `sales_fact_1997`.`time_id` = `time_by_day`.`time_id`
>> and
>> `time_by_day`.`the_year` = 1997 and
>> `time_by_day`.`quarter` = 'Q3'
>> group by
>> `product_class`.`product_family`,
>> `product_class`.`product_department`,
>> `product_class`.`product_category`,
>> `customer`.`country`,
>> `customer`.`state_province`,
>> `customer`.`city`
>> order by
>> sum(`sales_fact_1997`.`unit_sales`) DESC,
>> ISNULL(`product_class`.`product_family`) ASC,
>> `product_class`.`product_family` ASC,
>> ISNULL(`product_class`.`product_department`) ASC,
>> `product_class`.`product_department` ASC,
>> ISNULL(`product_class`.`product_category`) ASC,
>> `product_class`.`product_category` ASC,
>> ISNULL(`customer`.`country`) ASC, `customer`.`country`
>> ASC,
>> ISNULL(`customer`.`state_province`) ASC,
>> `customer`.`state_province` ASC,
>> ISNULL(`customer`.`city`) ASC, `customer`.`city` ASC
>> ) as `subsel`
>> right join (
>> select
>> `t0`.`product_family` as `cj0` ,
>> `t0`.`product_department` as `cj1` ,
>> `t0`.`product_category` as `cj2` ,
>> `t1`.`country` as `cj3` ,
>> `t1`.`state_province` as `cj4` ,
>> `t1`.`city` as `cj5`
>> from (
>> select
>> `product_class`.`product_family` as
>> `product_family`,
>> `product_class`.`product_department` as
>> `product_department`,
>> `product_class`.`product_category` as
>> `product_category`
>> from
>> `product` as `product`,
>> `product_class` as `product_class`
>> where
>> `product`.`product_class_id` =
>> `product_class`.`product_class_id`
>> group by
>> `product_class`.`product_family`,
>> `product_class`.`product_department`,
>> `product_class`.`product_category`
>> order by
>> ISNULL(`product_class`.`product_family`)
>> ASC, `product_class`.`product_family` ASC,
>> ISNULL(`product_class`.`product_department`)
>> ASC, `product_class`.`product_department` ASC,
>> ISNULL(`product_class`.`product_category`)
>> ASC, `product_class`.`product_category` ASC
>> ) as t0
>> cross join (
>> select
>> `customer`.`country` as `country`,
>> `customer`.`state_province` as
>> `state_province`,
>> `customer`.`city` as `city`
>> from
>> `customer` as `customer`
>> group by
>> `customer`.`country`,
>> `customer`.`state_province`,
>> `customer`.`city`
>> order by
>> ISNULL(`customer`.`country`) ASC,
>> `customer`.`country` ASC,
>> ISNULL(`customer`.`state_province`) ASC,
>> `customer`.`state_province` ASC,
>> ISNULL(`customer`.`city`) ASC,
>> `customer`.`city` ASC
>> ) as t1
>> ) as `crossj` on
>> (`subsel`.`c0` = `crossj`.`cj0`) and
>> (`subsel`.`c1` = `crossj`.`cj1`) and
>> (`subsel`.`c2` = `crossj`.`cj2`) and
>> (`subsel`.`c3` = `crossj`.`cj3`) and
>> (`subsel`.`c4` = `crossj`.`cj4`) and
>> (`subsel`.`c5` = `crossj`.`cj5`)
>> order by
>> subsel.`c6` DESC,
>> ISNULL(crossj.`cj0`) ASC, crossj.`cj0` ASC,
>> ISNULL(crossj.`cj1`) ASC, crossj.`cj1` ASC,
>> ISNULL(crossj.`cj2`) ASC, crossj.`cj2` ASC,
>> ISNULL(crossj.`cj3`) ASC, crossj.`cj3` ASC,
>> ISNULL(crossj.`cj4`) ASC, crossj.`cj4` ASC,
>> ISNULL(crossj.`cj5`) ASC, crossj.`cj5` ASC
>>
>> Unfortunately, Mondrian does not have many tools for manipulating with
>> SQL, thus I made my own.
>> I have completed the POC, it passes almost all prepared tests and before
>> starting polishing it I asked Matt C if he had any objection against the
>> approach. He told me to ask the community, and here I am with this thread.
>>
>> My current work was pushed here:
>> https://github.com/akhayrutdinov/mondrian/tree/NativeTopCount.
>> It consists of two commits:
>>
>> 1. Tests for TopCount:
>> https://github.com/akhayrutdinov/mondrian/commit/3963fb37a7da8284bb5ed1ddb79825c6bf7018cf
>> 2. My changes:
>> https://github.com/akhayrutdinov/mondrian/commit/2436bfbd6cca007f0f3fd866b8e61e7f4c2adb5c
>>
>> I want to ask your opinion whether the approach is acceptable not to
>> spend time for improving the code that will be rejected.
>>
>> Best regards,
>> Andrey Khayrutdinov
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20150922/034a47e6/attachment-0001.html
More information about the Mondrian
mailing list