[Mondrian] High Cardinality for Mondrian

Julian Hyde jhyde at pentaho.org
Mon Feb 11 05:00:38 EST 2008


Luis,

Thanks for contributing these changes (for a second time!). I will
incorporate them tomorrow, and if all goes well, they will be in
mondrian-3.0.

The approach you have used - lists backed by iterators - is very clever, but
I have some philosophical issues with it because it misleads the programmer.
A call to '.size()', for example, looks simple and people would imagine that
it is cheap, but as you point out it may cause a large collection to be
fetched into memory.

I guess you could call it the principle of Honest APIs. A few years ago I
read a similar critique of remote procedure calls, arguing that RPCs are
inherently expensive and unreliable, and that it was dishonest to wrap them
in an API that makes them look like regular procedure calls.

So, my instinct would have been to tackle large dimensions using an explicit
iterator API, so users are aware of the cost of what they are doing.

That said, your approach works now, my preferred approach would require a
major rework of the code base, and I am a pragmatist. If the high
cardinality support results in some performance 'gotchas', we can try to
devise incremental ways to make the whole mondrian API more predictable.

Julian

> -----Original Message-----
> From: mondrian-bounces at pentaho.org 
> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Luis F. Canals
> Sent: Friday, February 08, 2008 10:09 AM
> To: mondrian at pentaho.org
> Cc: Javier Giménez Aznar; jorge López Mateo
> Subject: [Mondrian] High Cardinality for Mondrian
> 
> Dear Julian Hyde,
> 
> after the hard hard task of reprogramming all changes made for version
> 2.4.2.9831 of Mondrian to provide the capability to manage high
> cardinality dimensions for head version present in Preforge, 
> we can send
> you the list of differences to be applied as a patch to 
> mondrian version
> present on Preforge.
> 
> Since we have no access to commit changes on Preforge, we will be very
> happy if you apply these changes and comment us any problems you can
> find that don't let the patch be applied.
> 
> All tests are passed (using mysql as database and Windows and Linux as
> operating systems, on Java 5 and 6).
> 
> Some properties have been added to "mondrian.properties" to 
> control high
> cardinality and multhreading for queries behaviour:
>     mondrian.result.highCardChunkSize indicates the number of elements
> taken at the same time when a dimensions is marked as 
> "highCardinality"
>     mondrian.rolap.MaximumParallelThreads indicates the maximum number
> of threads used to perform a query (since non dependant 
> queries are now
> parallelized)
> 
> In FoodMart.xml, we have made another change to identify 
> "Promotions" on
> cube "Sales Ragged" as high cardinality to test the system in 
> this case.
> 
> There are some other points whould have taken into account now that
> Mondrian is going to be able to manage ulimted dimensions:
>     - avoid the use of ".size()" over the list of elements of a,
> potentially, high cardinality dimension;
>     - avoid the copy of elements iterating over the complete 
> list of a,
> potentially, high cardinality dimension
>         (for example, things like
>             "for(Member m:list) {
>                 ...
>                 anotherList.add(list);
>                 ...
>             }")
>     - instead, use FilteredIterableList idea
>     - don't try to get the first element when you have been 
> got the last
> (i.e., doing "list.get(x)" after "list.get(y)" with y>>>x) over a list
> of elements of a, potentially, high cardinality dimension
>     - some functions need all the elements in memory (for 
> example "order
> by"); these functions are not going to run with high cardinality
> dimensions and an exception will be thrown
>     - if you don't need high cardinality dimension, simply 
> don't set the
> attribe "highCardinality" to true in schema (FoodMart.xml)
> 
> That's all!
> 
> Since we think is a quite powerful improvement (very useful for our
> clients) we would like these changes to be included in the 
> next release
> of Mondrian. Could it be possible?
> 
> Best regards.
> 
> - Jorge/Javier/Luis
> 




More information about the Mondrian mailing list