<br><font size=2 face="sans-serif">Hi,</font>
<br>
<br><font size=2 face="sans-serif">For now we are leaving All Members vs
Aggregate issue open and proceeding with getting Grouping sets functionality
working.</font>
<br>
<br><font size=2 face="sans-serif">We have developed a Proof of concept
to achieve the Grouping sets functionality, with this we have managed to
get all the tests except 2 tests passing. Two failing tests are because
of All member queries getting orphan rows(InlineTableTests).</font>
<br>
<br><font size=2 face="sans-serif">Limitations of our POC:</font>
<br><font size=2 face="sans-serif">1. "All value" will not be
consistent in cases where there are extra fact rows (i.e, rows that don't
connect to the dimension of interest).</font>
<br>
<br><font size=2 face="sans-serif">2. The implementation doesn't reduce
the number of queries in cases like <i>select {member,member.children}
...</i></font>
<br><font size=2 face="sans-serif">This is because, it becomes very tricky
to identify that members.children roll up to member, since the roll-up
logic heavily relies on constraint bit key to identify batch that can be
grouped into a single sql query</font>
<br>
<br><font size=2 face="sans-serif"><b>Note:</b> POC is a hacky solution,
it doesn't consider issues like Concurrency..</font>
<br>
<br><font size=2 face="sans-serif"><i>Logic for identifying the Batches
which can be grouped :</i></font>
<br><font size=2 face="sans-serif">1. If Batch A's constraint columns list
is super set of Batch B's constraint columns </font>
<br><font size=2 face="sans-serif"> and</font>
<br><font size=2 face="sans-serif">2. If matching columns of the Batch
A and Batch B has the same value </font>
<br><font size=2 face="sans-serif"> and</font>
<br><font size=2 face="sans-serif">3. If Non matching columns of super
set have All values </font>
<br>
<br><font size=2 face="sans-serif">We are sending this for an early feed
back, if you see any issues with our approach please let us know, we will
be starting our actual implementation next week.</font>
<br>
<br><font size=2 face="sans-serif">Attached the tar file created using
packChanges containing our POC in the forum post : http://forums.pentaho.org/showthread.php?t=53644</font>
<br>
<br><font size=2 face="sans-serif">Thanks,</font>
<br><font size=2 face="sans-serif">Thiyagu & Raghu</font>
<br>
<br>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=40%><font size=1 face="sans-serif"><b>Richard Emberson <remberson@edgedynamics.com></b>
</font>
<br><font size=1 face="sans-serif">Sent by: mondrian-bounces@pentaho.org</font>
<p><font size=1 face="sans-serif">04/24/2007 07:02 PM</font>
<table border>
<tr valign=top>
<td bgcolor=white>
<div align=center><font size=1 face="sans-serif">Please respond to<br>
Mondrian developer mailing list <mondrian@pentaho.org></font></div></table>
<br>
<td width=59%>
<table width=100%>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td><font size=1 face="sans-serif">Mondrian developer mailing list <mondrian@pentaho.org></font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">Re: [Mondrian] All Members vs Aggregate</font></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br>
<br><tt><font size=2>In our local usage, this has been know for a while;<br>
dimensions can have<br>
only partial coverage of the base fact table. If you<br>
have orphan rows in the fact table for a given<br>
dimension, then for an<br>
aggregate the sum across a lower level will not equal<br>
the sum at the top, all level. This is because for<br>
lower levels the sql is joining across from fact table<br>
foreign key to dimension table primary key. But for<br>
the top level result, there is an optimization(?)<br>
and the generated sql does not assure that only fact<br>
table rows with matching entries in the dimension table<br>
make up the result.<br>
One might wish that there was a property that controled<br>
this behavior, so that, if one wished, that no<br>
orphan rows be counted. The alternative is to have<br>
your dbs make sure that there is no bad data in the<br>
fact tables (we have experienced that, as with much of<br>
life, dirt happens; over time bad data creeps into<br>
fact tables).<br>
<br>
Richard<br>
<br>
<br>
Thiyagu Palanisamy wrote:<br>
> <br>
> As of now mondrian uses multiple sql queries to get data for a MDX
with <br>
> members at different levels. We are working on changing this
behavior <br>
> to combine these multiple sqls to single one.<br>
> <br>
> For an example this MDX generates 2 sqls, one for all gender and another
<br>
> for male,female<br>
> <br>
> select gender.members on 0 from [sales 2] where measures.[Sales Count];<br>
> <br>
> sql:<br>
> 1. /select "time_by_day"."the_year" as "c0",
"customer"."gender" as <br>
> "c1", count("sales_fact_1997"."product_id")
as "m0"<br>
> from "time_by_day" "time_by_day", "sales_fact_1997"
"sales_fact_1997", <br>
> "customer" "customer"<br>
> where<br>
> "sales_fact_1997"."time_id" = "time_by_day"."time_id"
and<br>
> "time_by_day"."the_year" = 1997 and<br>
> "sales_fact_1997"."customer_id" =
"customer"."customer_id"<br>
> group by "time_by_day"."the_year", "customer"."gender"
/<br>
> <br>
> 2. /select "time_by_day"."the_year" as "c0",
<br>
> count("sales_fact_1997"."product_id") as
"m0"<br>
> from "time_by_day" "time_by_day", "sales_fact_1997"
"sales_fact_1997"<br>
> where<br>
> "sales_fact_1997"."time_id" = "time_by_day"."time_id"
and<br>
> "time_by_day"."the_year" = 1997<br>
> group by<br>
> "time_by_day"."the_year"<br>
> /<br>
> <br>
> with our prototype, these sqls are grouped into one (this query will
<br>
> only work with DBs that support grouping sets):<br>
> <br>
> 1. /select "time_by_day"."the_year" as "c0",
"customer"."gender" as <br>
> "c1", count("sales_fact_1997"."product_id")
as "m0"<br>
> from "time_by_day" "time_by_day", "sales_fact_1997"
"sales_fact_1997", <br>
> "customer" "customer"<br>
> where<br>
> "sales_fact_1997"."time_id" = "time_by_day"."time_id"
and<br>
> "time_by_day"."the_year" = 1997 and<br>
> "sales_fact_1997"."customer_id" =
"customer"."customer_id"<br>
> group by grouping sets(("time_by_day"."the_year",
"customer"."gender"), <br>
> ("time_by_day"."the_year")) /<br>
> <br>
> Although the grouping set implementation looks fine and works in most
<br>
> cases, there is a tricky problem associated with it. The problem is
<br>
> grouping "all" (aggregate) value got from rolling up is
equal to <br>
> aggregate of the members involved and not the all value<br>
> <br>
> to reproduce it, alter fact table and insert rows for null customers:<br>
> <br>
> --modify fact table to allow nulls for customer_id<br>
> alter table "sales_fact_1997" modify "customer_id"
null <br>
> --add sales records with no customers<br>
> insert into "sales_fact_1997" values (337,371,null,0,2,6.3455,2.3445,4)<br>
> insert into "sales_fact_1997" values (337,440,null,0,2,6.3355,2.335,2)<br>
> <br>
> results:<br>
> remains same with grouping sql:<br>
> Axis #0:<br>
> {[Measures].[Sales Count]}<br>
> Axis #1:<br>
> {[Gender].[All Gender]}<br>
> {[Gender].[All Gender].[F]}<br>
> {[Gender].[All Gender].[M]}<br>
> Row #0: 86,837 <br>
> Row #0: 42,831<br>
> Row #0: 44,006<br>
> <br>
> without grouping (mondrian's current behavior):<br>
> ...<br>
> Row #0: 86,839 //includes 2 records that
don't belong to any gender<br>
> Row #0: 42,831<br>
> Row #0: 44,006<br>
> <br>
> In summary "All" in mondrian is not really an aggregate
of all the <br>
> members, it also includes data from rows that don't join to fact table.
<br>
> This behavior is not consistent with AS 2000 (MOLAP).<br>
> How should we handle this issue?<br>
> One of the options that we can think of is, to use outer joins.
The <br>
> problem with this is it returns addition row for "null"
gender.<br>
> <br>
> <br>
> <br>
> Thanks,<br>
> Thiyagu & Raghu<br>
> <br>
> <br>
> ------------------------------------------------------------------------<br>
> <br>
> _______________________________________________<br>
> Mondrian mailing list<br>
> Mondrian@pentaho.org<br>
> http://lists.pentaho.org/mailman/listinfo/mondrian<br>
<br>
<br>
-- <br>
Quis custodiet ipsos custodes:<br>
This email message is for the sole use of the intended recipient(s) and<br>
may contain confidential information. Any unauthorized review, use,<br>
disclosure or distribution is prohibited. If you are not the intended<br>
recipient, please contact the sender by reply email and destroy all<br>
copies of the original message.<br>
_______________________________________________<br>
Mondrian mailing list<br>
Mondrian@pentaho.org<br>
http://lists.pentaho.org/mailman/listinfo/mondrian<br>
</font></tt>
<br>