[Mondrian] distinct-count and explicit aggregate table recognition
Matt Campbell
mkambol at gmail.com
Thu Jan 10 10:51:12 EST 2013
Tomasz-
I've entered MONDRIAN-1370 <http://jira.pentaho.com/browse/MONDRIAN-1370> for
the bug you found. I expect it will get picked up for the next release.
Thanks for identifying this.
Matt
On Tue, Jan 8, 2013 at 5:44 PM, Matt Campbell <mkambol at gmail.com> wrote:
>
> I see what your saying-- the aggregator selected when using the
> ExplicitRecognizer is different from the DefaultRecognizer. A count
> distinct measure in an explicit aggregate can be incorrectly summed.
>
> I'll discuss this with the team and get back to you.
>
>
> On Tue, Jan 8, 2013 at 3:24 PM, Tomasz Karlik <tkarlik at gmail.com> wrote:
>
>> I suppose that without this change Mondrian will always use aggregate
>> table. Aggregate table column is handled by RolapAggregator.Sum, so
>> code created to deal with different distinct count scenarios is
>> omitted.
>>
>> When using rule-based recognition Mondrian uses Recognizer.java code
>> (lines 782-792):
>>
>> // NOTE: This assumes that the aggregate table does not have an
>> explicit
>> // average column.
>> if (factAgg == RolapAggregator.Avg) {
>> String columnExpr = getFactCountExpr(aggUsage);
>> return new RolapAggregator.AvgFromSum(columnExpr);
>> } else if (factAgg == RolapAggregator.DistinctCount) {
>> //return RolapAggregator.Count;
>> return RolapAggregator.DistinctCount;
>> } else {
>> return factAgg;
>> }
>> Are there really necessary separate methods to deal with both of
>> recognizers? How to change schema definition, to handle distinct-count
>> aggregated measures correctly when using explicit recognizer?
>>
>>
>> On Mon, Jan 7, 2013 at 3:46 PM, Matt Campbell <mkambol at gmail.com> wrote:
>> > Distinct count measures in aggregate tables are tricky and require care
>> on
>> > the part of the schema author. There are some cases where it is
>> appropriate
>> > to pull a distinct count value from an aggregate table--i.e. when the
>> > attributes are dependent on the measure (like gender in the case of
>> customer
>> > count).
>> >
>> > I'm hoping Julian or Luc can jump in, but I'd be worried your change
>> would
>> > cause Mondrian to always query the fact table for count distincts, even
>> when
>> > it may be appropriate to query the agg table.
>> >
>> >
>> > On Mon, Jan 7, 2013 at 3:47 AM, Tomasz Karlik <tkarlik at gmail.com>
>> wrote:
>> >>
>> >> SQL seems to be wrong:
>> >>
>> >> select
>> >> "agg_c_10_sales_fact_1997"."the_year" as "c0",
>> >> sum("agg_c_10_sales_fact_1997"."customer_count") as "m0"
>> >> from
>> >> "agg_c_10_sales_fact_1997" as "agg_c_10_sales_fact_1997"
>> >> where
>> >> "agg_c_10_sales_fact_1997"."the_year" = 1997
>> >> group by
>> >> "agg_c_10_sales_fact_1997"."the_year"
>> >>
>> >> Log file is in attachment.
>> >>
>> >> On Mon, Jan 7, 2013 at 9:19 AM, Julian Hyde <jhyde at pentaho.com> wrote:
>> >> > The easiest way to approach this is to look at the SQL Mondrian is
>> >> > generating. Is it asking the right question of the DBMS?
>> >> >
>> >> > Can you turn on SQL tracing and post the log you get during the run
>> that
>> >> > produces the incorrect results.
>> >> >
>> >> > Julian
>> >> >
>> >> >
>> >> > On Jan 7, 2013, at 12:10 AM, Tomasz Karlik <tkarlik at gmail.com>
>> wrote:
>> >> >
>> >> >> I am getting different results while running simple query on
>> FoodMart
>> >> >> (PostgreSql db). Result differs and seems to be wrong when using
>> >> >> explicit aggregate. I've created simple test which shows this
>> >> >> behavior. Also I've made some patch, but i'm not sure if the problem
>> >> >> is in sources? The problem doesn't appear when aggregate table is
>> >> >> recognized by the default rules.
>> >> >>
>> >> >> query:
>> >> >>
>> >> >> select
>> >> >>
>> >> >> NON EMPTY {[Measures].[Customer Count]} ON COLUMNS,
>> >> >>
>> >> >> NON EMPTY {[Time].[Year].Members} ON ROWS
>> >> >>
>> >> >> from [Sales]
>> >> >>
>> >> >> aggregate table: agg_c_10_sales_fact_1997
>> >> >>
>> >> >>
>> >> >> expected result:
>> >> >>
>> >> >> Axis #0:
>> >> >>
>> >> >> {}
>> >> >>
>> >> >> Axis #1:
>> >> >>
>> >> >> {[Measures].[Customer Count]}
>> >> >>
>> >> >> Axis #2:
>> >> >>
>> >> >> {[Time].[1997]}
>> >> >>
>> >> >> Row #0: 5 581
>> >> >>
>> >> >>
>> >> >> actual result:
>> >> >>
>> >> >> Axis #0:
>> >> >>
>> >> >> {}
>> >> >>
>> >> >> Axis #1:
>> >> >>
>> >> >> {[Measures].[Customer Count]}
>> >> >>
>> >> >> Axis #2:
>> >> >>
>> >> >> {[Time].[1997]}
>> >> >>
>> >> >> Row #0: 16 947
>> >> >>
>> >> >>
>> <ExplicitRecognizerTest.java><mondrian_rolap_aggmatcher_Recognizer.java.patch>_______________________________________________
>> >> >> Mondrian mailing list
>> >> >> Mondrian at pentaho.org
>> >> >> http://lists.pentaho.org/mailman/listinfo/mondrian
>> >> >
>> >> > _______________________________________________
>> >> > Mondrian mailing list
>> >> > Mondrian at pentaho.org
>> >> > http://lists.pentaho.org/mailman/listinfo/mondrian
>> >>
>> >> _______________________________________________
>> >> Mondrian mailing list
>> >> Mondrian at pentaho.org
>> >> http://lists.pentaho.org/mailman/listinfo/mondrian
>> >>
>> >
>> >
>> > _______________________________________________
>> > Mondrian mailing list
>> > Mondrian at pentaho.org
>> > http://lists.pentaho.org/mailman/listinfo/mondrian
>> >
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20130110/c375a6e4/attachment.html
More information about the Mondrian
mailing list