[Mondrian] change 10256 [was Re: change 9710: aggregating count-distinct over compound cells]

Rushan Chen rchen at lucidera.com
Wed Dec 5 23:50:35 EST 2007


I just checked in change list 10256 the improvement of distinct count 
aggregate loading, as proposed here:

http://www.eigenbase.org/wiki/index.php/MondrianDistinctCountAggregateImprovement

A few notes besides what is outlined in the document:

(1) Grouping sets: these are already disabled when distinct count 
aggregates are present. This change does not extent the usage of 
grouping set when building groups to load. Does any one know happen to 
know why this is disabled? If grouping set is enabled one day for 
distinct count, change 10256 will allow that to be extended to queries 
with "compound constraints" commonly expressed using Agg function.

(2) Cache Flushing: the algorithm to derived an "overlapping" region is 
not aware of the compound constraints so  cells might be flushed  when 
they do not need to , for example,  when the region to flush is 
[Time].[1998] but the constraint limit the aggregate for a cell to only 
looking at values that are in {[1997].[Q1], [1997].[Q3]}. There can be 
future improvement in this area.

I also added a new property to help with unit tests that expects SQL 
patterns.

mondrian.test.WarnIfNoPatternForDialect

Sometimes a test expecting a sql pattern is not available in all 
dialects, and setting this property to that dialect name will print out 
warning if a test is missing a sql pattern. This way users can be 
alerted if sql tests do not cover the dialects of interests. By default 
it is set to NONE which is no warning.

There's also a new ant target

ant test-list

which lets you print out what tests will be run, and their ordinals in 
the running sequence. So if there's any error inside a particular suite, 
it is easy to locate the offending test methods, after some "dot 
counting" of the test output.

Lastly, the change is tested fairly well on derby, mysql, oracle xe, 
luciddb, however, there could be misses still in the sql tests for other 
DBs. If you use Mondrian primarily with a DB not listed, or with non 
default parameter settings, I would recommend running the regression 
suite to be sure after syncing the latest code.

Rushan

Matt Campbell wrote:
> Rushan,
> These changes sound like something that could help us a lot, too.  Do 
> you have any guess about when you might be implementing the change?
>
> Thanks,
> Matt
>
> On Nov 25, 2007 6:55 PM, Julian Hyde < jhyde at pentaho.org 
> <mailto:jhyde at pentaho.org>> wrote:
>
>     > Rushan Chen wrote:
>     >
>     > I have drafted a design doc based on the "CellContext" idea
>     > to improve
>     > the performance of aggregate loading for cells with
>     > "compound" constraints.
>     >
>     >
>     http://www.eigenbase.org/wiki/index.php/MondrianDistinctCountAggregateImprov
>     ement
>     >
>     > This proposal requires pretty far-reaching code change so I did
>     some
>     > prototyping to make sure this idea would work. So far, despite the
>     > sizable changes required, the basic functionalities(batch loading,
>     > caching) are working with some careful extraction of code and
>     > streamlining of interfaces. This hopefully will make aggregate
>     > loading/caching more modular and using the new set of interfaces
>     less
>     > error prone.
>     >
>     > Since I have done just some prototyping, there could be design
>     flaws
>     > lurking still. I would really appreciate your input and/or
>     > your comments
>     > on how to better test this improvement.
>
>     Rushan,
>
>     Thanks for the heads up and the detailed design document. I've
>     read it
>     through once, and can't find fault with it. It looks like it will
>     work, and
>     it's more elegant than what I have now.
>
>     I will read it in more detail on my plane journey from London.
>
>     Julian
>
>     _______________________________________________
>     Mondrian mailing list
>     Mondrian at pentaho.org <mailto:Mondrian at pentaho.org>
>     http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>   


-- 
Rushan Chen

rchen at lucidera.com

Read <http://tinyurl.com/ypc73a>  our customer reviews: "LucidEra is a
must have tool for any company that extensively uses salesforce.com"

Test drive <http://www.lucidera.com/test-drive.php>  LucidEra Revenue
Cycle Analysis

Comment <http://www.lucidera.com/blog/>  on our "Keep it Simple" blog





More information about the Mondrian mailing list