[Mondrian] All Members vs Aggregate
Thiyagu Palanisamy
tpalanis at thoughtworks.com
Fri Apr 27 09:22:58 EDT 2007
Hi,
For now we are leaving All Members vs Aggregate issue open and proceeding
with getting Grouping sets functionality working.
We have developed a Proof of concept to achieve the Grouping sets
functionality, with this we have managed to get all the tests except 2
tests passing. Two failing tests are because of All member queries getting
orphan rows(InlineTableTests).
Limitations of our POC:
1. "All value" will not be consistent in cases where there are extra fact
rows (i.e, rows that don't connect to the dimension of interest).
2. The implementation doesn't reduce the number of queries in cases like
select {member,member.children} ...
This is because, it becomes very tricky to identify that members.children
roll up to member, since the roll-up logic heavily relies on constraint
bit key to identify batch that can be grouped into a single sql query
Note: POC is a hacky solution, it doesn't consider issues like
Concurrency..
Logic for identifying the Batches which can be grouped :
1. If Batch A's constraint columns list is super set of Batch B's
constraint columns
and
2. If matching columns of the Batch A and Batch B has the same value
and
3. If Non matching columns of super set have All values
We are sending this for an early feed back, if you see any issues with our
approach please let us know, we will be starting our actual implementation
next week.
Attached the tar file created using packChanges containing our POC in the
forum post : http://forums.pentaho.org/showthread.php?t=53644
Thanks,
Thiyagu & Raghu
Richard Emberson <remberson at edgedynamics.com>
Sent by: mondrian-bounces at pentaho.org
04/24/2007 07:02 PM
Please respond to
Mondrian developer mailing list <mondrian at pentaho.org>
To
Mondrian developer mailing list <mondrian at pentaho.org>
cc
Subject
Re: [Mondrian] All Members vs Aggregate
In our local usage, this has been know for a while;
dimensions can have
only partial coverage of the base fact table. If you
have orphan rows in the fact table for a given
dimension, then for an
aggregate the sum across a lower level will not equal
the sum at the top, all level. This is because for
lower levels the sql is joining across from fact table
foreign key to dimension table primary key. But for
the top level result, there is an optimization(?)
and the generated sql does not assure that only fact
table rows with matching entries in the dimension table
make up the result.
One might wish that there was a property that controled
this behavior, so that, if one wished, that no
orphan rows be counted. The alternative is to have
your dbs make sure that there is no bad data in the
fact tables (we have experienced that, as with much of
life, dirt happens; over time bad data creeps into
fact tables).
Richard
Thiyagu Palanisamy wrote:
>
> As of now mondrian uses multiple sql queries to get data for a MDX with
> members at different levels. We are working on changing this behavior
> to combine these multiple sqls to single one.
>
> For an example this MDX generates 2 sqls, one for all gender and another
> for male,female
>
> select gender.members on 0 from [sales 2] where measures.[Sales Count];
>
> sql:
> 1. /select "time_by_day"."the_year" as "c0", "customer"."gender" as
> "c1", count("sales_fact_1997"."product_id") as "m0"
> from "time_by_day" "time_by_day", "sales_fact_1997" "sales_fact_1997",
> "customer" "customer"
> where
> "sales_fact_1997"."time_id" = "time_by_day"."time_id" and
> "time_by_day"."the_year" = 1997 and
> "sales_fact_1997"."customer_id" = "customer"."customer_id"
> group by "time_by_day"."the_year", "customer"."gender" /
>
> 2. /select "time_by_day"."the_year" as "c0",
> count("sales_fact_1997"."product_id") as "m0"
> from "time_by_day" "time_by_day", "sales_fact_1997" "sales_fact_1997"
> where
> "sales_fact_1997"."time_id" = "time_by_day"."time_id" and
> "time_by_day"."the_year" = 1997
> group by
> "time_by_day"."the_year"
> /
>
> with our prototype, these sqls are grouped into one (this query will
> only work with DBs that support grouping sets):
>
> 1. /select "time_by_day"."the_year" as "c0", "customer"."gender" as
> "c1", count("sales_fact_1997"."product_id") as "m0"
> from "time_by_day" "time_by_day", "sales_fact_1997" "sales_fact_1997",
> "customer" "customer"
> where
> "sales_fact_1997"."time_id" = "time_by_day"."time_id" and
> "time_by_day"."the_year" = 1997 and
> "sales_fact_1997"."customer_id" = "customer"."customer_id"
> group by grouping sets(("time_by_day"."the_year", "customer"."gender"),
> ("time_by_day"."the_year")) /
>
> Although the grouping set implementation looks fine and works in most
> cases, there is a tricky problem associated with it. The problem is
> grouping "all" (aggregate) value got from rolling up is equal to
> aggregate of the members involved and not the all value
>
> to reproduce it, alter fact table and insert rows for null customers:
>
> --modify fact table to allow nulls for customer_id
> alter table "sales_fact_1997" modify "customer_id" null
> --add sales records with no customers
> insert into "sales_fact_1997" values (337,371,null,0,2,6.3455,2.3445,4)
> insert into "sales_fact_1997" values (337,440,null,0,2,6.3355,2.335,2)
>
> results:
> remains same with grouping sql:
> Axis #0:
> {[Measures].[Sales Count]}
> Axis #1:
> {[Gender].[All Gender]}
> {[Gender].[All Gender].[F]}
> {[Gender].[All Gender].[M]}
> Row #0: 86,837
> Row #0: 42,831
> Row #0: 44,006
>
> without grouping (mondrian's current behavior):
> ...
> Row #0: 86,839 //includes 2 records that don't belong to any
gender
> Row #0: 42,831
> Row #0: 44,006
>
> In summary "All" in mondrian is not really an aggregate of all the
> members, it also includes data from rows that don't join to fact table.
> This behavior is not consistent with AS 2000 (MOLAP).
> How should we handle this issue?
> One of the options that we can think of is, to use outer joins. The
> problem with this is it returns addition row for "null" gender.
>
>
>
> Thanks,
> Thiyagu & Raghu
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
--
Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.
_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070427/0f1bf7e9/attachment.html
More information about the Mondrian
mailing list