[Mondrian] distinct-count and explicit aggregate table recognition

Matt Campbell mkambol at gmail.com
Thu Jan 10 10:51:12 EST 2013


Tomasz-
I've entered MONDRIAN-1370 <http://jira.pentaho.com/browse/MONDRIAN-1370> for
the bug you found.  I expect it will get picked up for the next release.
 Thanks for identifying this.

Matt


On Tue, Jan 8, 2013 at 5:44 PM, Matt Campbell <mkambol at gmail.com> wrote:

>
> I see what your saying-- the aggregator selected when using the
> ExplicitRecognizer is different from the DefaultRecognizer.  A count
> distinct measure in an explicit aggregate can be incorrectly summed.
>
> I'll discuss this with the team and get back to you.
>
>
> On Tue, Jan 8, 2013 at 3:24 PM, Tomasz Karlik <tkarlik at gmail.com> wrote:
>
>> I suppose that without this change Mondrian will always use aggregate
>> table. Aggregate table column is handled by RolapAggregator.Sum, so
>> code created to deal with different distinct count scenarios is
>> omitted.
>>
>> When using rule-based recognition Mondrian uses Recognizer.java code
>> (lines 782-792):
>>
>>         // NOTE: This assumes that the aggregate table does not have an
>> explicit
>>         // average column.
>>         if (factAgg == RolapAggregator.Avg) {
>>             String columnExpr = getFactCountExpr(aggUsage);
>>             return new RolapAggregator.AvgFromSum(columnExpr);
>>         } else if (factAgg == RolapAggregator.DistinctCount) {
>>             //return RolapAggregator.Count;
>>             return RolapAggregator.DistinctCount;
>>         } else {
>>             return factAgg;
>>         }
>> Are there really necessary separate methods to deal with both of
>> recognizers? How to change schema definition, to handle distinct-count
>> aggregated measures correctly when using explicit recognizer?
>>
>>
>> On Mon, Jan 7, 2013 at 3:46 PM, Matt Campbell <mkambol at gmail.com> wrote:
>> > Distinct count measures in aggregate tables are tricky and require care
>> on
>> > the part of the schema author.  There are some cases where it is
>> appropriate
>> > to pull a distinct count value from an aggregate table--i.e. when the
>> > attributes are dependent on the measure (like gender in the case of
>> customer
>> > count).
>> >
>> > I'm hoping Julian or Luc can jump in, but I'd be worried your change
>> would
>> > cause Mondrian to always query the fact table for count distincts, even
>> when
>> > it may be appropriate to query the agg table.
>> >
>> >
>> > On Mon, Jan 7, 2013 at 3:47 AM, Tomasz Karlik <tkarlik at gmail.com>
>> wrote:
>> >>
>> >> SQL seems to be wrong:
>> >>
>> >> select
>> >>     "agg_c_10_sales_fact_1997"."the_year" as "c0",
>> >>     sum("agg_c_10_sales_fact_1997"."customer_count") as "m0"
>> >> from
>> >>     "agg_c_10_sales_fact_1997" as "agg_c_10_sales_fact_1997"
>> >> where
>> >>     "agg_c_10_sales_fact_1997"."the_year" = 1997
>> >> group by
>> >>     "agg_c_10_sales_fact_1997"."the_year"
>> >>
>> >> Log file is in attachment.
>> >>
>> >> On Mon, Jan 7, 2013 at 9:19 AM, Julian Hyde <jhyde at pentaho.com> wrote:
>> >> > The easiest way to approach this is to look at the SQL Mondrian is
>> >> > generating. Is it asking the right question of the DBMS?
>> >> >
>> >> > Can you turn on SQL tracing and post the log you get during the run
>> that
>> >> > produces the incorrect results.
>> >> >
>> >> > Julian
>> >> >
>> >> >
>> >> > On Jan 7, 2013, at 12:10 AM, Tomasz Karlik <tkarlik at gmail.com>
>> wrote:
>> >> >
>> >> >> I am getting different results while running simple query on
>> FoodMart
>> >> >> (PostgreSql db). Result differs and seems to be wrong when using
>> >> >> explicit aggregate. I've created simple test which shows this
>> >> >> behavior. Also I've made some patch, but i'm not sure if the problem
>> >> >> is in sources? The problem doesn't appear when aggregate table is
>> >> >> recognized by the default rules.
>> >> >>
>> >> >> query:
>> >> >>
>> >> >> select
>> >> >>
>> >> >> NON EMPTY {[Measures].[Customer Count]} ON COLUMNS,
>> >> >>
>> >> >> NON EMPTY {[Time].[Year].Members} ON ROWS
>> >> >>
>> >> >> from [Sales]
>> >> >>
>> >> >> aggregate table: agg_c_10_sales_fact_1997
>> >> >>
>> >> >>
>> >> >> expected result:
>> >> >>
>> >> >> Axis #0:
>> >> >>
>> >> >> {}
>> >> >>
>> >> >> Axis #1:
>> >> >>
>> >> >> {[Measures].[Customer Count]}
>> >> >>
>> >> >> Axis #2:
>> >> >>
>> >> >> {[Time].[1997]}
>> >> >>
>> >> >> Row #0: 5 581
>> >> >>
>> >> >>
>> >> >> actual result:
>> >> >>
>> >> >> Axis #0:
>> >> >>
>> >> >> {}
>> >> >>
>> >> >> Axis #1:
>> >> >>
>> >> >> {[Measures].[Customer Count]}
>> >> >>
>> >> >> Axis #2:
>> >> >>
>> >> >> {[Time].[1997]}
>> >> >>
>> >> >> Row #0: 16 947
>> >> >>
>> >> >>
>> <ExplicitRecognizerTest.java><mondrian_rolap_aggmatcher_Recognizer.java.patch>_______________________________________________
>> >> >> Mondrian mailing list
>> >> >> Mondrian at pentaho.org
>> >> >> http://lists.pentaho.org/mailman/listinfo/mondrian
>> >> >
>> >> > _______________________________________________
>> >> > Mondrian mailing list
>> >> > Mondrian at pentaho.org
>> >> > http://lists.pentaho.org/mailman/listinfo/mondrian
>> >>
>> >> _______________________________________________
>> >> Mondrian mailing list
>> >> Mondrian at pentaho.org
>> >> http://lists.pentaho.org/mailman/listinfo/mondrian
>> >>
>> >
>> >
>> > _______________________________________________
>> > Mondrian mailing list
>> > Mondrian at pentaho.org
>> > http://lists.pentaho.org/mailman/listinfo/mondrian
>> >
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20130110/c375a6e4/attachment.html 


More information about the Mondrian mailing list