[Mondrian] All Members vs Aggregate

Richard Emberson remberson at edgedynamics.com
Tue Apr 24 09:32:05 EDT 2007


In our local usage, this has been know for a while;
dimensions can have
only partial coverage of the base fact table. If you
have orphan rows in the fact table for a given
dimension, then for an
aggregate the sum across a lower level will not equal
the sum at the top, all level. This is because for
lower levels the sql is joining across from fact table
foreign key to dimension table primary key. But for
the top level result, there is an optimization(?)
and the generated sql does not assure that only fact
table rows with matching entries in the dimension table
make up the result.
One might wish that there was a property that controled
this behavior, so that, if one wished, that no
orphan rows be counted. The alternative is to have
your dbs make sure that there is no bad data in the
fact tables (we have experienced that, as with much of
life, dirt happens; over time bad data creeps into
fact tables).

Richard


Thiyagu Palanisamy wrote:
> 
> As of now mondrian uses multiple sql queries to get data for a MDX with 
>  members at different levels. We are working on changing this behavior 
> to combine these multiple sqls to single one.
> 
> For an example this MDX generates 2 sqls, one for all gender and another 
> for male,female
> 
> select gender.members on 0 from [sales 2] where measures.[Sales Count];
> 
> sql:
> 1. /select  "time_by_day"."the_year" as "c0",  "customer"."gender" as 
> "c1",  count("sales_fact_1997"."product_id") as "m0"
> from "time_by_day" "time_by_day",  "sales_fact_1997" "sales_fact_1997", 
>  "customer" "customer"
> where
>    "sales_fact_1997"."time_id" = "time_by_day"."time_id" and
>    "time_by_day"."the_year" = 1997 and
>    "sales_fact_1997"."customer_id" = "customer"."customer_id"
> group by  "time_by_day"."the_year", "customer"."gender" /
> 
> 2. /select "time_by_day"."the_year" as "c0", 
>  count("sales_fact_1997"."product_id") as "m0"
> from "time_by_day" "time_by_day", "sales_fact_1997" "sales_fact_1997"
> where
>    "sales_fact_1997"."time_id" = "time_by_day"."time_id" and
>    "time_by_day"."the_year" = 1997
> group by
>    "time_by_day"."the_year"
> /
> 
> with our prototype, these sqls are grouped into one (this query will 
> only  work with DBs that support grouping sets):
> 
> 1. /select "time_by_day"."the_year" as "c0",  "customer"."gender" as 
> "c1",  count("sales_fact_1997"."product_id") as "m0"
> from "time_by_day" "time_by_day",  "sales_fact_1997" "sales_fact_1997", 
>  "customer" "customer"
> where
>    "sales_fact_1997"."time_id" = "time_by_day"."time_id" and
>    "time_by_day"."the_year" = 1997 and
>    "sales_fact_1997"."customer_id" = "customer"."customer_id"
> group by  grouping sets(("time_by_day"."the_year", "customer"."gender"), 
> ("time_by_day"."the_year")) /
> 
> Although the grouping set implementation looks fine and works in most 
> cases, there is a tricky problem associated with it. The problem is 
> grouping "all" (aggregate) value got from rolling up is equal to 
> aggregate of the members involved and not the all value
> 
> to reproduce it, alter fact table and insert rows for null customers:
> 
> --modify fact table to allow nulls for customer_id
> alter table "sales_fact_1997" modify "customer_id" null        
> --add sales records with no customers
> insert into "sales_fact_1997" values (337,371,null,0,2,6.3455,2.3445,4)
> insert into "sales_fact_1997" values (337,440,null,0,2,6.3355,2.335,2)
> 
> results:
> remains same with grouping sql:
> Axis #0:
> {[Measures].[Sales Count]}
> Axis #1:
> {[Gender].[All Gender]}
> {[Gender].[All Gender].[F]}
> {[Gender].[All Gender].[M]}
> Row #0: 86,837      
> Row #0: 42,831
> Row #0: 44,006
> 
> without grouping (mondrian's current behavior):
> ...
> Row #0: 86,839         //includes 2 records that don't belong to any gender
> Row #0: 42,831
> Row #0: 44,006
> 
> In summary "All" in mondrian is not really an aggregate of all the 
> members, it also includes data from rows that don't join to fact table. 
> This behavior is not consistent with AS 2000 (MOLAP).
> How should we handle this issue?
>  One of the options that we can think of is, to use outer joins. The 
> problem with this is it returns addition row for "null" gender.
> 
> 
> 
> Thanks,
> Thiyagu & Raghu
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian


-- 
Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.



More information about the Mondrian mailing list