[Mondrian] GROUPING SETS with count distinct

Matt Campbell mkambol at gmail.com
Thu Jan 29 13:18:01 EST 2009


Both.  There are certain circumstances in which Teradata will give the wrong
answer to functionally accurate GROUPING SETS queries.  There are also
certain cases in which an MDX query with both a summable and count distinct
measure will cause Mondrian to return the value of the summable instead of
the count distinct, but only when using GROUPING SETS.  2207515 is an
example of this.  The SQL looks correct in this case, however.


On Thu, Jan 29, 2009 at 12:19 PM, Julian Hyde <julian at hydromatic.net> wrote:

>  To be clear: Do you think Mondrian is generating the wrong SQL? Or is
> Teradata giving the wrong answer to the right SQL?
>
>  ------------------------------
> *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
> *On Behalf Of *Matt Campbell
> *Sent:* Thursday, January 29, 2009 7:46 AM
> *To:* jhyde at pentaho.com; Mondrian developer mailing list
> *Subject:* Re: [Mondrian] GROUPING SETS with count distinct
>
>  Thanks Julian.  We'll update the defect and make the change, assuming we
> hear no objections.  And yes, we use Teradata and Oracle (DB2 as well, but
> its been much less tested with GROUPING SETS).  We actually found other
> GROUPING SETS/count distinct related bugs with certain versions of teradata,
> so in general this seems a somewhat risky combination.
>
>
> On Wed, Jan 28, 2009 at 8:05 PM, Julian Hyde <jhyde at pentaho.com> wrote:
>
>>  If this change fixes that bug, go ahead and make it.
>>
>> When you do, please add some more detail to the bug. The bug currently
>> doesn't make the explicit link to distinct-counts (although it does mention
>> the [Customer Count] measure). It would also be good if you show the bad SQL
>> being generated before the bug is fixed.
>>
>> Mak a note in the bug that you removed this functionality as a cheap way
>> to fix the bug, and remove logic (if any) in the GROUPING SETS code that was
>> there explicitly for distinct-count measures.
>>
>> If there's anyone else using GROUPING SETS with distinct-count, speak up
>> now. Matt uses Teradata and Oracle (right Matt?) bit it's possible that
>> other databases (DB2?) or versions can optimize distinct-count if it's in
>> one batch (GROUPING SETS) rather than multiple batches. If so, Matt's change
>> might make your performance worse.
>>
>> Julian
>>
>>  ------------------------------
>> *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
>> *On Behalf Of *Matt Campbell
>> *Sent:* Wednesday, January 28, 2009 4:17 AM
>> *To:* Mondrian developer mailing list
>> *Subject:* [Mondrian] GROUPING SETS with count distinct
>>
>>   About a year ago the developers from Thoughtworks put in a change
>> (10480) that made GROUPING SETS fire for count distinct measures.  There was
>> virtually no performance benefit to using GROUPING SETS with count distinct
>> queries, but it helped solve a functional issue we had with totaling when
>> dimensions don't fully join to the fact table.  We've since found better
>> solutions to that particular problem, and we no longer have a functional
>> need to have GROUPING SETS fire for count distinct queries.
>>
>> We've run into some bugs around GROUPING SETS with count distinct measures
>> (see 2207515<http://sourceforge.net/tracker/index.php?func=detail&aid=2207515&group_id=35302&atid=414613>),
>> and have been considering re-disabling that functionality.  Does anyone have
>> opinions about this?
>>
>> I believe the change would be to simply reintroduce checks for distinct
>> count measures in FastBatchingCellReader.canBatch():
>>
>>         boolean canBatch(Batch other) {
>>
>>             return !hasDistinctCountMeasure() &&
>>
>>                 !other.hasDistinctCountMeasure() &&
>>
>>                 hasOverlappingBitKeys(other) &&
>>
>>                 (hasSameCompoundPredicate(other) ||
>>
>>
>> haveSameValuesForOverlappingColumnsOrHasAllChildrenForOthers(other))
>>
>>                 && hasSameMeasureList(other)
>>
>>                 && haveSameStarAndAggregation(other);
>>
>>         }
>>
>>
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20090129/846246f2/attachment.html 


More information about the Mondrian mailing list