[Mondrian] High Cardinality for Mondrian

Richard Emberson remberson at edgedynamics.com
Fri Feb 15 12:18:33 EST 2008

Now that you mentioned it, I do recall talking to you about
this very fact while I was working on adding Iterable to
the join code.
It was also mused that when Mondrian was compiled with
dynamic code creation and Member array source (think RowSource)
wire-up of such code,
that the dataflow analysis would allow for the generation
of the correct, efficient wire-up datastructures and datatypes.

So its arrays for now and maybe in Mondrian 5 (6?, 7...)
a radical shift will take place.
I hope that Luis et. al. will accept that such a longer
term vision would address their concerns.

Anyway, just my thoughts. Thanks.


Julian Hyde wrote:
> There are two distinct collections here. Consider a method that returns a
> list of tuples.  It could return Member[][], or List<Member[]>, or
> List<List<Member>>, or even List<Member>[].
> When we are talking about collections of members or tuples, I totally agree
> that we should be returning a list (or in certain cases an iterator). These
> collections are often large (thousands of members) and dynamic. Luis and I
> were in agreement about that.
> However, the other collection is to represent a tuple. Tuples typically have
> low cardinality, and the cardinality is fixed: you almost never want to
> increase a tuple from cardinality 2 to cardinality 3. There are a lot of
> instances of tuples, so it is important that they can be created cheaply and
> use the minimum of memory. 
> My use case is a crossjoin that returns 10,000 tuples each with 2 members.
> If we represent each tuple as a list, that's at least one extra object per
> tuple, if we use the cheapest list available (the one created by
> Arrays.asList()). The memory and CPU overhead is unacceptable.
> That's what I want to represent a tuple as a Member[] rather than
> List<Member>, and I was clear with Luis that's what I wanted.
> Julian
>> -----Original Message-----
>> From: Richard Emberson [mailto:remberson at edgedynamics.com] 
>> Sent: Wednesday, February 13, 2008 6:59 AM
>> To: jhyde at pentaho.org; Mondrian developer mailing list
>> Cc: luis.canals at stratebi.com; 'Javier Giménez Aznar'; 'jorge 
>> López Mateo'
>> Subject: Re: [Mondrian] High Cardinality for Mondrian
>> Julian,
>> As long as I worked on Mondrian I always wondered why some of the
>> API returned arrays rather than lists (Member[] rather than
>> List<? extends Member>). I would submit that arrays are
>> less flexible, more expensive to create and no faster to use.
>> What are the advantages of returning arrays?
>> I granted that there are a number of clients which would have to
>> migrate to any new API and backward compatibility could the
>> the sole reason.
>> Thanks.
>> Richard
>> Julian Hyde wrote:
>>> Luis,
>>> I have spent the past day or so integrating your changes 
>> with the latest
>>> mondrian source code. I am not ready to check in yet, for 
>> two reasons.
>>> First, there is a serious performance degradation. For 
>> instance, the test
>>> suite run against oracle on jdk 1.6 takes 260 seconds 
>> before the change, 540
>>> seconds after it.
>>> Second, the code is not in a state to check in. I sent you 
>> an email in
>>> January describing the changes I needed you to make to the 
>> code to make it
>>> ready for submission, and you have done very few of them. 
>> In particular, you
>>> have changed the format of tuples from Member[] to List<? 
>> extends Member>,
>>> something I explicitly requested you did not do.
>>> I know that this is a major change and you and your 
>> colleagues have spent a
>>> lot of time working on it, but it is a waste of my time if 
>> I give you a
>>> detailed list of issues and you ignore it.
>>> I would like this change to make it into the mondrian code 
>> base, so I intend
>>> to carry on working on it to see if I can resolve the 
>> performance and other
>>> issues so that I can check it in. However, mondrian 3.0 is 
>> due very soon and
>>> I don't think this change will make it in before then.
>>> Julian
>>> ----
>>> My email from 2008/1/15, with my comments on what is left 
>> to do in the code
>>> drop of 2008/2/8.
>>>> Thanks for the contribution. I'll accept the contribution and 
>>>> put it into the release, but there are quite a few issues I'd 
>>>> like to resolve before I do that.
>>>> * Please make a pass through your changes again and remove 
>>>> any garbage code (e.g. code for debugging) or files you have 
>>>> edited but don't intend to check in (e.g. mondrian.properties)
>>> NOT DONE - mondrian.properties is included in highcard.diff
>>>> * The code has moved on a lot since 2.4.2. Can you send me 
>>>> the diffs with respect to the current perforce head? That 
>>>> would help me a lot. To do this, sync your perforce client to 
>>>> 2.4.2 ('p4 sync ... at 9831'), copy in your files, 'p4 edit 
>>>> <filename>' for each file you have modified, then get the 
>>>> latest ('p4 sync ...'), and resolve the changes ('p4 resolve').
>>> You sent a patch - I appreciate that. But it wasn't based 
>> on perforce head.
>>> After about an hour of experiments, I discovered that it 
>> was based on
>>> perforce change 10467 (Jan 24th) or thereabouts. You should 
>> have mentioned
>>> this.
>>>> * Does the code pass the regression suite? And on what 
>>>> DBMS/JDK/Operating system? Have you tried running 'megatest' 
>>>> (which sets various properties). The code needs to pass all 
>>>> tests before I can release it.
>>> You say that it passes tests on Oracle and MySQL. I believe 
>> you. I am seeing
>>> 2 failures and 3 errors, but maybe this is due to a merge 
>> mistake on my
>>> part.
>>>> * As I said before, I want to keep the tuple representation 
>>>> as a member array, not a member list. In particular, 
>>>> TupleCalc.evaluateTuple(Evaluator) should return Member[] not 
>>>> List<Member>.
>>> NOT DONE. This is major change, because it negatively 
>> impacts memory and
>>> performance, and does not even contribute to your goal of making
>>> high-cardinality dimensions work better. I asked you twice 
>> to do this and
>>> you ignored me. Did you think I wasn't going to notice?!
>>>> * In AbstractXxxCalc, there are a few toString() methods on a 
>>>> single line - make multiple lines or eliminate them.
>>>> * Likewise hashCode() in ConcatenableList
>>>> * ConcatenableList needs javadoc and a unit test
>>>> * In CmdRunner.java, remove ex.printStackTrace lines. I don't 
>>>> think you intended to check these in.
>>>> * Dimension.isHighCardinality() method needs javadoc
>>> Done - thank you.
>>>> * MondrianProperties.java - property descriptions have 
>>>> spelling mistakes, should start with "Integer property which ...".
>>>> * Document properties in configuration.html and 
>>>> mondrian.properties. I need a MUCH better description of the 
>>>> MaxParallelThreads property: something which would be 
>>>> understood by a DBA, not just a mondrian developer. I had to 
>>>> delve into the code to find out that this related to loading 
>>>> aggregations using GROUPING SETS.
>>>> * There are some files included which I guess you don't want 
>>>> to commit, e.g. mondrian.properties, 
>>>> demo/foodmart.properties. Please remove these from the patch file.
>>>> * When breaking long lines, please use continuation indent 4 not 8.
>>>> * Make sure lines don't have trailing spaces. These may cause 
>>>> spurious diffs later.
>>> NOT DONE - I counted 191 new or modified lines with trailing spaces.
>>>> * Uses of UnsupportedList in HeadTailFunDef: could you use 
>>>> AbstractList instead? I think you implement all of the 
>>>> required methods.
>>>> * Aggregation.java and other places: code formatting; control 
>>>> flow constructs like 'if' and 'while' always use braces and 
>>>> span multiple lines.
>>>> * Aggregation.java: you've made the load method asynchronous. 
>>>> I'm not convinced that this is safe. In particular, how does 
>>>> the caller know when the load has completed? Need to update 
>>>> the javadoc of that method with these important details.
>>>> * If we're using Threads, let's get them from a thread pool. 
>>>> Can we use the JDK 1.5 concurrency features like Future and 
>>>> ThreadPoolExecutor? That will give us more control over how 
>>>> the threads are created, and save the effort of creating new 
>>>> threads; all important in a container. In JDK 1.4 it's OK if 
>>>> the code works single-threaded.
>>>> * CellRequest.java contains garbage code - please remove.
>>> Done - thanks.
>>>> * CacheMapTest.java - should end with '// End CacheMapTest.java'
>>> _______________________________________________
>>> Mondrian mailing list
>>> Mondrian at pentaho.org
>>> http://lists.pentaho.org/mailman/listinfo/mondrian
>> -- 
>> Quis custodiet ipsos custodes:
>> This email message is for the sole use of the intended 
>> recipient(s) and
>> may contain confidential information.  Any unauthorized review, use,
>> disclosure or distribution is prohibited.  If you are not the intended
>> recipient, please contact the sender by reply email and destroy all
>> copies of the original message.

Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.

More information about the Mondrian mailing list