[Mondrian] All Members vs Aggregate

Thiyagu Palanisamy tpalanis at thoughtworks.com
Fri Apr 27 09:22:58 EDT 2007


Hi,

For now we are leaving All Members vs Aggregate issue open and proceeding 
with getting Grouping sets functionality working.

We have developed a Proof of concept to achieve the Grouping sets 
functionality, with this we have managed to get all the tests except 2 
tests passing. Two failing tests are because of All member queries getting 
orphan rows(InlineTableTests).

Limitations of our POC:
1. "All value" will not be consistent in cases where there are extra fact 
rows (i.e, rows that don't connect to the dimension of interest).

2. The implementation doesn't reduce the number of queries in cases like 
select {member,member.children} ...
This is because, it becomes very tricky to identify that members.children 
roll up to member, since the roll-up logic heavily relies on constraint 
bit key to identify batch that can be grouped into a single sql query

Note: POC is a hacky solution, it doesn't consider issues like 
Concurrency..

Logic for identifying the Batches which can be grouped :
1. If Batch A's constraint columns list is super set of Batch B's 
constraint columns 
   and
2. If matching columns of the Batch A and Batch B has the same value 
   and
3. If Non matching columns of super set have All values 

We are sending this for an early feed back, if you see any issues with our 
approach please let us know, we will be starting our actual implementation 
next week.

Attached the tar file created using packChanges containing our POC in the 
forum post : http://forums.pentaho.org/showthread.php?t=53644

Thanks,
Thiyagu & Raghu





Richard Emberson <remberson at edgedynamics.com> 
Sent by: mondrian-bounces at pentaho.org
04/24/2007 07:02 PM
Please respond to
Mondrian developer mailing list <mondrian at pentaho.org>


To
Mondrian developer mailing list <mondrian at pentaho.org>
cc

Subject
Re: [Mondrian] All Members vs Aggregate






In our local usage, this has been know for a while;
dimensions can have
only partial coverage of the base fact table. If you
have orphan rows in the fact table for a given
dimension, then for an
aggregate the sum across a lower level will not equal
the sum at the top, all level. This is because for
lower levels the sql is joining across from fact table
foreign key to dimension table primary key. But for
the top level result, there is an optimization(?)
and the generated sql does not assure that only fact
table rows with matching entries in the dimension table
make up the result.
One might wish that there was a property that controled
this behavior, so that, if one wished, that no
orphan rows be counted. The alternative is to have
your dbs make sure that there is no bad data in the
fact tables (we have experienced that, as with much of
life, dirt happens; over time bad data creeps into
fact tables).

Richard


Thiyagu Palanisamy wrote:
> 
> As of now mondrian uses multiple sql queries to get data for a MDX with 
>  members at different levels. We are working on changing this behavior 
> to combine these multiple sqls to single one.
> 
> For an example this MDX generates 2 sqls, one for all gender and another 

> for male,female
> 
> select gender.members on 0 from [sales 2] where measures.[Sales Count];
> 
> sql:
> 1. /select  "time_by_day"."the_year" as "c0",  "customer"."gender" as 
> "c1",  count("sales_fact_1997"."product_id") as "m0"
> from "time_by_day" "time_by_day",  "sales_fact_1997" "sales_fact_1997", 
>  "customer" "customer"
> where
>    "sales_fact_1997"."time_id" = "time_by_day"."time_id" and
>    "time_by_day"."the_year" = 1997 and
>    "sales_fact_1997"."customer_id" = "customer"."customer_id"
> group by  "time_by_day"."the_year", "customer"."gender" /
> 
> 2. /select "time_by_day"."the_year" as "c0", 
>  count("sales_fact_1997"."product_id") as "m0"
> from "time_by_day" "time_by_day", "sales_fact_1997" "sales_fact_1997"
> where
>    "sales_fact_1997"."time_id" = "time_by_day"."time_id" and
>    "time_by_day"."the_year" = 1997
> group by
>    "time_by_day"."the_year"
> /
> 
> with our prototype, these sqls are grouped into one (this query will 
> only  work with DBs that support grouping sets):
> 
> 1. /select "time_by_day"."the_year" as "c0",  "customer"."gender" as 
> "c1",  count("sales_fact_1997"."product_id") as "m0"
> from "time_by_day" "time_by_day",  "sales_fact_1997" "sales_fact_1997", 
>  "customer" "customer"
> where
>    "sales_fact_1997"."time_id" = "time_by_day"."time_id" and
>    "time_by_day"."the_year" = 1997 and
>    "sales_fact_1997"."customer_id" = "customer"."customer_id"
> group by  grouping sets(("time_by_day"."the_year", "customer"."gender"), 

> ("time_by_day"."the_year")) /
> 
> Although the grouping set implementation looks fine and works in most 
> cases, there is a tricky problem associated with it. The problem is 
> grouping "all" (aggregate) value got from rolling up is equal to 
> aggregate of the members involved and not the all value
> 
> to reproduce it, alter fact table and insert rows for null customers:
> 
> --modify fact table to allow nulls for customer_id
> alter table "sales_fact_1997" modify "customer_id" null 
> --add sales records with no customers
> insert into "sales_fact_1997" values (337,371,null,0,2,6.3455,2.3445,4)
> insert into "sales_fact_1997" values (337,440,null,0,2,6.3355,2.335,2)
> 
> results:
> remains same with grouping sql:
> Axis #0:
> {[Measures].[Sales Count]}
> Axis #1:
> {[Gender].[All Gender]}
> {[Gender].[All Gender].[F]}
> {[Gender].[All Gender].[M]}
> Row #0: 86,837 
> Row #0: 42,831
> Row #0: 44,006
> 
> without grouping (mondrian's current behavior):
> ...
> Row #0: 86,839         //includes 2 records that don't belong to any 
gender
> Row #0: 42,831
> Row #0: 44,006
> 
> In summary "All" in mondrian is not really an aggregate of all the 
> members, it also includes data from rows that don't join to fact table. 
> This behavior is not consistent with AS 2000 (MOLAP).
> How should we handle this issue?
>  One of the options that we can think of is, to use outer joins. The 
> problem with this is it returns addition row for "null" gender.
> 
> 
> 
> Thanks,
> Thiyagu & Raghu
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian


-- 
Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.
_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070427/0f1bf7e9/attachment.html 


More information about the Mondrian mailing list