[Mondrian] GROUPING SETS with count distinct

Julian Hyde julian at hydromatic.net
Thu Jan 29 12:19:40 EST 2009


To be clear: Do you think Mondrian is generating the wrong SQL? Or is
Teradata giving the wrong answer to the right SQL?


  _____  

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of Matt Campbell
Sent: Thursday, January 29, 2009 7:46 AM
To: jhyde at pentaho.com; Mondrian developer mailing list
Subject: Re: [Mondrian] GROUPING SETS with count distinct


Thanks Julian.  We'll update the defect and make the change, assuming we
hear no objections.  And yes, we use Teradata and Oracle (DB2 as well, but
its been much less tested with GROUPING SETS).  We actually found other
GROUPING SETS/count distinct related bugs with certain versions of teradata,
so in general this seems a somewhat risky combination.


On Wed, Jan 28, 2009 at 8:05 PM, Julian Hyde <jhyde at pentaho.com> wrote:


If this change fixes that bug, go ahead and make it.
 
When you do, please add some more detail to the bug. The bug currently
doesn't make the explicit link to distinct-counts (although it does mention
the [Customer Count] measure). It would also be good if you show the bad SQL
being generated before the bug is fixed.
 
Mak a note in the bug that you removed this functionality as a cheap way to
fix the bug, and remove logic (if any) in the GROUPING SETS code that was
there explicitly for distinct-count measures.
 
If there's anyone else using GROUPING SETS with distinct-count, speak up
now. Matt uses Teradata and Oracle (right Matt?) bit it's possible that
other databases (DB2?) or versions can optimize distinct-count if it's in
one batch (GROUPING SETS) rather than multiple batches. If so, Matt's change
might make your performance worse.
 
Julian


  _____  

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of Matt Campbell
Sent: Wednesday, January 28, 2009 4:17 AM
To: Mondrian developer mailing list
Subject: [Mondrian] GROUPING SETS with count distinct



About a year ago the developers from Thoughtworks put in a change (10480)
that made GROUPING SETS fire for count distinct measures.  There was
virtually no performance benefit to using GROUPING SETS with count distinct
queries, but it helped solve a functional issue we had with totaling when
dimensions don't fully join to the fact table.  We've since found better
solutions to that particular problem, and we no longer have a functional
need to have GROUPING SETS fire for count distinct queries. 

We've run into some bugs around GROUPING SETS with count distinct measures
(see 2207515
<http://sourceforge.net/tracker/index.php?func=detail&aid=2207515&group_id=3
5302&atid=414613> ), and have been considering re-disabling that
functionality.  Does anyone have opinions about this?

I believe the change would be to simply reintroduce checks for distinct
count measures in FastBatchingCellReader.canBatch():

        boolean canBatch(Batch other) {

            return !hasDistinctCountMeasure() &&

                !other.hasDistinctCountMeasure() &&

                hasOverlappingBitKeys(other) &&

                (hasSameCompoundPredicate(other) ||

 
haveSameValuesForOverlappingColumnsOrHasAllChildrenForOthers(other))

                && hasSameMeasureList(other)

                && haveSameStarAndAggregation(other);

        }


_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20090129/234ae2cd/attachment.html 


More information about the Mondrian mailing list