[Mondrian] Improvements for High Cardinality

Tue Jan 15 21:09:11 EST 2008

Luis,

Thanks for the contribution. I'll accept the contribution and put it into
the release, but there are quite a few issues I'd like to resolve before I
do that.

* Please make a pass through your changes again and remove any garbage code
(e.g. code for debugging) or files you have edited but don't intend to check
in (e.g. mondrian.properties)

* The code has moved on a lot since 2.4.2. Can you send me the diffs with
respect to the current perforce head? That would help me a lot. To do this,
sync your perforce client to 2.4.2 ('p4 sync ... at 9831'), copy in your files,
'p4 edit <filename>' for each file you have modified, then get the latest
('p4 sync ...'), and resolve the changes ('p4 resolve').

* Does the code pass the regression suite? And on what DBMS/JDK/Operating
system? Have you tried running 'megatest' (which sets various properties).
The code needs to pass all tests before I can release it.

* As I said before, I want to keep the tuple representation as a member
array, not a member list. In particular, TupleCalc.evaluateTuple(Evaluator)
should return Member[] not List<Member>.

* In AbstractXxxCalc, there are a few toString() methods on a single line -
make multiple lines or eliminate them.

* Likewise hashCode() in ConcatenableList

* ConcatenableList needs javadoc and a unit test

* In CmdRunner.java, remove ex.printStackTrace lines. I don't think you
intended to check these in.

* Dimension.isHighCardinality() method needs javadoc

* MondrianProperties.java - property descriptions have spelling mistakes,
should start with "Integer property which ...".

* Document properties in configuration.html and mondrian.properties. I need
a MUCH better description of the MaxParallelThreads property: something
which would be understood by a DBA, not just a mondrian developer. I had to
delve into the code to find out that this related to loading aggregations
using GROUPING SETS.

* There are some files included which I guess you don't want to commit, e.g.
mondrian.properties, demo/foodmart.properties. Please remove these from the
patch file.

* When breaking long lines, please use continuation indent 4 not 8.

* Make sure lines don't have trailing spaces. These may cause spurious diffs
later.

* Uses of UnsupportedList in HeadTailFunDef: could you use AbstractList
instead? I think you implement all of the required methods.

* Aggregation.java and other places: code formatting; control flow
constructs like 'if' and 'while' always use braces and span multiple lines.

* Aggregation.java: you've made the load method asynchronous. I'm not
convinced that this is safe. In particular, how does the caller know when
the load has completed? Need to update the javadoc of that method with these
important details.

* If we're using Threads, let's get them from a thread pool. Can we use the
JDK 1.5 concurrency features like Future and ThreadPoolExecutor? That will
give us more control over how the threads are created, and save the effort
of creating new threads; all important in a container. In JDK 1.4 it's OK if
the code works single-threaded.

* CellRequest.java contains garbage code - please remove.

* CacheMapTest.java - should end with '// End CacheMapTest.java'

Julian

> -----Original Message-----
> From: mondrian-bounces at pentaho.org 
> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Luis F. Canals
> Sent: Tuesday, January 15, 2008 9:00 AM
> To: mondrian at pentaho.org
> Cc: Javier Giménez Aznar; Jorge López Mateo
> Subject: [Mondrian] Improvements for High Cardinality
> 
> Hi all,
> 
> as you may remember, we were working (since november 2007) to give
> Mondrian capability to deal with high cardinality dimensions.
> In this time, we have improved load process from database with the
> capablity of executing several parallel threads to get information as
> fast as databases are able to.
> 
> Ooook. Here, we have our first approach to the solution.
> We have made a lot of changes, so we send attached a patch to 
> be applied
> to mondrian 2.4.2-9831.
> Main modifications are:
>     1.- changed Member[] to List<Member> (there is no memory and
> performance impact and let us to move towards an iterable list)
>     2.- add new boolean property to .xml describing schema:
> highCardinality, for dimensions
>     3.- add new properties into mondrian.properties to manage high
> cardinality issues:
>              mondrian.result.highCardChunkSize to indicate how many
> cells are read at a the same time when highcardinality dimensions are
> involved
>              mondrian.rolap.MaximumParallelThreads to 
> indicate how many
> parallel threads are allowed to get results from an MDX query
> 
> There are many improvements on performance (basically because even for
> low dimensions results needn't to be read and managed as a "big block"
> causing the allocation and deallocation of big amounts of bytes).
> 
> Changes to API are described in patch file. (Aply it using:
>        patch -p0 < diff)
> 
> Please, we are very interested in including these changes 
> into the next
> release.
> If you have any question, send us. (javier.gimenez at stratebi.com,
> jorge.lopez at stratebi.com, luis.canals at stratebi.com)
> We have opened questions: for instance, which functions can 
> be used with
> high cardinality dimensions? (At this point, only "head" and 
> "tail" are
> fully implemented for high cardinality.) We will be very happy if you
> find more!
> 
> Best regards,
> 
> - javier, jorge & luis
>