[Mondrian] All Members vs Aggregate

Thiyagu Palanisamy tpalanis at thoughtworks.com
Tue Apr 24 07:32:50 EDT 2007


As of now mondrian uses multiple sql queries to get data for a MDX with 
members at different levels. We are working on changing this behavior to 
combine these multiple sqls to single one.

For an example this MDX generates 2 sqls, one for all gender and another 
for male,female 

select gender.members on 0 from [sales 2] where measures.[Sales Count];

sql:
1. select  "time_by_day"."the_year" as "c0",  "customer"."gender" as "c1", 
 count("sales_fact_1997"."product_id") as "m0" 
from "time_by_day" "time_by_day",  "sales_fact_1997" "sales_fact_1997", 
"customer" "customer"
where 
    "sales_fact_1997"."time_id" = "time_by_day"."time_id" and 
    "time_by_day"."the_year" = 1997 and 
    "sales_fact_1997"."customer_id" = "customer"."customer_id"
group by  "time_by_day"."the_year", "customer"."gender" 

2. select "time_by_day"."the_year" as "c0", 
count("sales_fact_1997"."product_id") as "m0"
from "time_by_day" "time_by_day", "sales_fact_1997" "sales_fact_1997" 
where 
    "sales_fact_1997"."time_id" = "time_by_day"."time_id" and 
    "time_by_day"."the_year" = 1997
group by 
    "time_by_day"."the_year" 


with our prototype, these sqls are grouped into one (this query will only 
work with DBs that support grouping sets):

1. select "time_by_day"."the_year" as "c0",  "customer"."gender" as "c1", 
count("sales_fact_1997"."product_id") as "m0" 
from "time_by_day" "time_by_day",  "sales_fact_1997" "sales_fact_1997", 
"customer" "customer"
where 
    "sales_fact_1997"."time_id" = "time_by_day"."time_id" and 
    "time_by_day"."the_year" = 1997 and 
    "sales_fact_1997"."customer_id" = "customer"."customer_id"
group by  grouping sets(("time_by_day"."the_year", "customer"."gender"), 
("time_by_day"."the_year")) 

Although the grouping set implementation looks fine and works in most 
cases, there is a tricky problem associated with it. The problem is 
grouping "all" (aggregate) value got from rolling up is equal to aggregate 
of the members involved and not the all value 

to reproduce it, alter fact table and insert rows for null customers:

 --modify fact table to allow nulls for customer_id
alter table "sales_fact_1997" modify "customer_id" null 
--add sales records with no customers
insert into "sales_fact_1997" values (337,371,null,0,2,6.3455,2.3445,4) 
insert into "sales_fact_1997" values (337,440,null,0,2,6.3355,2.335,2)

results: 
remains same with grouping sql:
Axis #0:
{[Measures].[Sales Count]}
Axis #1:
{[Gender].[All Gender]}
{[Gender].[All Gender].[F]}
{[Gender].[All Gender].[M]}
Row #0: 86,837 
Row #0: 42,831 
Row #0: 44,006

without grouping (mondrian's current behavior):
...
Row #0: 86,839         //includes 2 records that don't belong to any 
gender
Row #0: 42,831
Row #0: 44,006

In summary "All" in mondrian is not really an aggregate of all the 
members, it also includes data from rows that don't join to fact table. 
This behavior is not consistent with AS 2000 (MOLAP). 
How should we handle this issue? 
 One of the options that we can think of is, to use outer joins. The 
problem with this is it returns addition row for "null" gender.



Thanks,
Thiyagu & Raghu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070424/c6743574/attachment.html 


More information about the Mondrian mailing list