[Mondrian] distinct-count and explicit aggregate table recognition

Matt Campbell mkambol at gmail.com
Tue Jan 8 17:44:22 EST 2013


I see what your saying-- the aggregator selected when using the
ExplicitRecognizer is different from the DefaultRecognizer.  A count
distinct measure in an explicit aggregate can be incorrectly summed.

I'll discuss this with the team and get back to you.


On Tue, Jan 8, 2013 at 3:24 PM, Tomasz Karlik <tkarlik at gmail.com> wrote:

> I suppose that without this change Mondrian will always use aggregate
> table. Aggregate table column is handled by RolapAggregator.Sum, so
> code created to deal with different distinct count scenarios is
> omitted.
>
> When using rule-based recognition Mondrian uses Recognizer.java code
> (lines 782-792):
>
>         // NOTE: This assumes that the aggregate table does not have an
> explicit
>         // average column.
>         if (factAgg == RolapAggregator.Avg) {
>             String columnExpr = getFactCountExpr(aggUsage);
>             return new RolapAggregator.AvgFromSum(columnExpr);
>         } else if (factAgg == RolapAggregator.DistinctCount) {
>             //return RolapAggregator.Count;
>             return RolapAggregator.DistinctCount;
>         } else {
>             return factAgg;
>         }
> Are there really necessary separate methods to deal with both of
> recognizers? How to change schema definition, to handle distinct-count
> aggregated measures correctly when using explicit recognizer?
>
>
> On Mon, Jan 7, 2013 at 3:46 PM, Matt Campbell <mkambol at gmail.com> wrote:
> > Distinct count measures in aggregate tables are tricky and require care
> on
> > the part of the schema author.  There are some cases where it is
> appropriate
> > to pull a distinct count value from an aggregate table--i.e. when the
> > attributes are dependent on the measure (like gender in the case of
> customer
> > count).
> >
> > I'm hoping Julian or Luc can jump in, but I'd be worried your change
> would
> > cause Mondrian to always query the fact table for count distincts, even
> when
> > it may be appropriate to query the agg table.
> >
> >
> > On Mon, Jan 7, 2013 at 3:47 AM, Tomasz Karlik <tkarlik at gmail.com> wrote:
> >>
> >> SQL seems to be wrong:
> >>
> >> select
> >>     "agg_c_10_sales_fact_1997"."the_year" as "c0",
> >>     sum("agg_c_10_sales_fact_1997"."customer_count") as "m0"
> >> from
> >>     "agg_c_10_sales_fact_1997" as "agg_c_10_sales_fact_1997"
> >> where
> >>     "agg_c_10_sales_fact_1997"."the_year" = 1997
> >> group by
> >>     "agg_c_10_sales_fact_1997"."the_year"
> >>
> >> Log file is in attachment.
> >>
> >> On Mon, Jan 7, 2013 at 9:19 AM, Julian Hyde <jhyde at pentaho.com> wrote:
> >> > The easiest way to approach this is to look at the SQL Mondrian is
> >> > generating. Is it asking the right question of the DBMS?
> >> >
> >> > Can you turn on SQL tracing and post the log you get during the run
> that
> >> > produces the incorrect results.
> >> >
> >> > Julian
> >> >
> >> >
> >> > On Jan 7, 2013, at 12:10 AM, Tomasz Karlik <tkarlik at gmail.com> wrote:
> >> >
> >> >> I am getting different results while running simple query on FoodMart
> >> >> (PostgreSql db). Result differs and seems to be wrong when using
> >> >> explicit aggregate. I've created simple test which shows this
> >> >> behavior. Also I've made some patch, but i'm not sure if the problem
> >> >> is in sources? The problem doesn't appear when aggregate table is
> >> >> recognized by the default rules.
> >> >>
> >> >> query:
> >> >>
> >> >> select
> >> >>
> >> >> NON EMPTY {[Measures].[Customer Count]} ON COLUMNS,
> >> >>
> >> >> NON EMPTY {[Time].[Year].Members} ON ROWS
> >> >>
> >> >> from [Sales]
> >> >>
> >> >> aggregate table: agg_c_10_sales_fact_1997
> >> >>
> >> >>
> >> >> expected result:
> >> >>
> >> >> Axis #0:
> >> >>
> >> >> {}
> >> >>
> >> >> Axis #1:
> >> >>
> >> >> {[Measures].[Customer Count]}
> >> >>
> >> >> Axis #2:
> >> >>
> >> >> {[Time].[1997]}
> >> >>
> >> >> Row #0: 5 581
> >> >>
> >> >>
> >> >> actual result:
> >> >>
> >> >> Axis #0:
> >> >>
> >> >> {}
> >> >>
> >> >> Axis #1:
> >> >>
> >> >> {[Measures].[Customer Count]}
> >> >>
> >> >> Axis #2:
> >> >>
> >> >> {[Time].[1997]}
> >> >>
> >> >> Row #0: 16 947
> >> >>
> >> >>
> <ExplicitRecognizerTest.java><mondrian_rolap_aggmatcher_Recognizer.java.patch>_______________________________________________
> >> >> Mondrian mailing list
> >> >> Mondrian at pentaho.org
> >> >> http://lists.pentaho.org/mailman/listinfo/mondrian
> >> >
> >> > _______________________________________________
> >> > Mondrian mailing list
> >> > Mondrian at pentaho.org
> >> > http://lists.pentaho.org/mailman/listinfo/mondrian
> >>
> >> _______________________________________________
> >> Mondrian mailing list
> >> Mondrian at pentaho.org
> >> http://lists.pentaho.org/mailman/listinfo/mondrian
> >>
> >
> >
> > _______________________________________________
> > Mondrian mailing list
> > Mondrian at pentaho.org
> > http://lists.pentaho.org/mailman/listinfo/mondrian
> >
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20130108/a3cdf999/attachment.html 


More information about the Mondrian mailing list