[Mondrian] High Cardinality for Mondrian

Julian Hyde jhyde at pentaho.org
Wed Feb 13 17:06:04 EST 2008


There are two distinct collections here. Consider a method that returns a
list of tuples.  It could return Member[][], or List<Member[]>, or
List<List<Member>>, or even List<Member>[].

When we are talking about collections of members or tuples, I totally agree
that we should be returning a list (or in certain cases an iterator). These
collections are often large (thousands of members) and dynamic. Luis and I
were in agreement about that.

However, the other collection is to represent a tuple. Tuples typically have
low cardinality, and the cardinality is fixed: you almost never want to
increase a tuple from cardinality 2 to cardinality 3. There are a lot of
instances of tuples, so it is important that they can be created cheaply and
use the minimum of memory. 

My use case is a crossjoin that returns 10,000 tuples each with 2 members.
If we represent each tuple as a list, that's at least one extra object per
tuple, if we use the cheapest list available (the one created by
Arrays.asList()). The memory and CPU overhead is unacceptable.

That's what I want to represent a tuple as a Member[] rather than
List<Member>, and I was clear with Luis that's what I wanted.

Julian

> -----Original Message-----
> From: Richard Emberson [mailto:remberson at edgedynamics.com] 
> Sent: Wednesday, February 13, 2008 6:59 AM
> To: jhyde at pentaho.org; Mondrian developer mailing list
> Cc: luis.canals at stratebi.com; 'Javier Giménez Aznar'; 'jorge 
> López Mateo'
> Subject: Re: [Mondrian] High Cardinality for Mondrian
> 
> Julian,
> 
> As long as I worked on Mondrian I always wondered why some of the
> API returned arrays rather than lists (Member[] rather than
> List<? extends Member>). I would submit that arrays are
> less flexible, more expensive to create and no faster to use.
> 
> What are the advantages of returning arrays?
> 
> I granted that there are a number of clients which would have to
> migrate to any new API and backward compatibility could the
> the sole reason.
> 
> Thanks.
> 
> Richard
> 
> 
> Julian Hyde wrote:
> > Luis,
> > 
> > I have spent the past day or so integrating your changes 
> with the latest
> > mondrian source code. I am not ready to check in yet, for 
> two reasons.
> > 
> > First, there is a serious performance degradation. For 
> instance, the test
> > suite run against oracle on jdk 1.6 takes 260 seconds 
> before the change, 540
> > seconds after it.
> > 
> > Second, the code is not in a state to check in. I sent you 
> an email in
> > January describing the changes I needed you to make to the 
> code to make it
> > ready for submission, and you have done very few of them. 
> In particular, you
> > have changed the format of tuples from Member[] to List<? 
> extends Member>,
> > something I explicitly requested you did not do.
> > 
> > I know that this is a major change and you and your 
> colleagues have spent a
> > lot of time working on it, but it is a waste of my time if 
> I give you a
> > detailed list of issues and you ignore it.
> > 
> > I would like this change to make it into the mondrian code 
> base, so I intend
> > to carry on working on it to see if I can resolve the 
> performance and other
> > issues so that I can check it in. However, mondrian 3.0 is 
> due very soon and
> > I don't think this change will make it in before then.
> > 
> > Julian
> > 
> > 
> > ----
> > 
> > My email from 2008/1/15, with my comments on what is left 
> to do in the code
> > drop of 2008/2/8.
> > 
> >> Thanks for the contribution. I'll accept the contribution and 
> >> put it into the release, but there are quite a few issues I'd 
> >> like to resolve before I do that.
> >>
> >> * Please make a pass through your changes again and remove 
> >> any garbage code (e.g. code for debugging) or files you have 
> >> edited but don't intend to check in (e.g. mondrian.properties)
> > 
> > NOT DONE - mondrian.properties is included in highcard.diff
> > 
> >> * The code has moved on a lot since 2.4.2. Can you send me 
> >> the diffs with respect to the current perforce head? That 
> >> would help me a lot. To do this, sync your perforce client to 
> >> 2.4.2 ('p4 sync ... at 9831'), copy in your files, 'p4 edit 
> >> <filename>' for each file you have modified, then get the 
> >> latest ('p4 sync ...'), and resolve the changes ('p4 resolve').
> > 
> > You sent a patch - I appreciate that. But it wasn't based 
> on perforce head.
> > After about an hour of experiments, I discovered that it 
> was based on
> > perforce change 10467 (Jan 24th) or thereabouts. You should 
> have mentioned
> > this.
> > 
> >> * Does the code pass the regression suite? And on what 
> >> DBMS/JDK/Operating system? Have you tried running 'megatest' 
> >> (which sets various properties). The code needs to pass all 
> >> tests before I can release it.
> > 
> > You say that it passes tests on Oracle and MySQL. I believe 
> you. I am seeing
> > 2 failures and 3 errors, but maybe this is due to a merge 
> mistake on my
> > part.
> > 
> >> * As I said before, I want to keep the tuple representation 
> >> as a member array, not a member list. In particular, 
> >> TupleCalc.evaluateTuple(Evaluator) should return Member[] not 
> >> List<Member>.
> > 
> > NOT DONE. This is major change, because it negatively 
> impacts memory and
> > performance, and does not even contribute to your goal of making
> > high-cardinality dimensions work better. I asked you twice 
> to do this and
> > you ignored me. Did you think I wasn't going to notice?!
> > 
> >> * In AbstractXxxCalc, there are a few toString() methods on a 
> >> single line - make multiple lines or eliminate them.
> > 
> > NOT DONE
> > 
> >> * Likewise hashCode() in ConcatenableList
> > 
> > NOT DONE
> > 
> >> * ConcatenableList needs javadoc and a unit test
> > 
> > NOT DONE
> > 
> >> * In CmdRunner.java, remove ex.printStackTrace lines. I don't 
> >> think you intended to check these in.
> > 
> > NOT DONE
> > 
> >> * Dimension.isHighCardinality() method needs javadoc
> > 
> > Done - thank you.
> > 
> >> * MondrianProperties.java - property descriptions have 
> >> spelling mistakes, should start with "Integer property which ...".
> > 
> > NOT DONE
> > 
> >> * Document properties in configuration.html and 
> >> mondrian.properties. I need a MUCH better description of the 
> >> MaxParallelThreads property: something which would be 
> >> understood by a DBA, not just a mondrian developer. I had to 
> >> delve into the code to find out that this related to loading 
> >> aggregations using GROUPING SETS.
> > 
> > NOT DONE
> > 
> >> * There are some files included which I guess you don't want 
> >> to commit, e.g. mondrian.properties, 
> >> demo/foodmart.properties. Please remove these from the patch file.
> >>
> >> * When breaking long lines, please use continuation indent 4 not 8.
> > 
> > NOT DONE
> > 
> >> * Make sure lines don't have trailing spaces. These may cause 
> >> spurious diffs later.
> > 
> > NOT DONE - I counted 191 new or modified lines with trailing spaces.
> > 
> >> * Uses of UnsupportedList in HeadTailFunDef: could you use 
> >> AbstractList instead? I think you implement all of the 
> >> required methods.
> > 
> > NOT DONE
> > 
> >> * Aggregation.java and other places: code formatting; control 
> >> flow constructs like 'if' and 'while' always use braces and 
> >> span multiple lines.
> > 
> > NOT DONE
> > 
> >> * Aggregation.java: you've made the load method asynchronous. 
> >> I'm not convinced that this is safe. In particular, how does 
> >> the caller know when the load has completed? Need to update 
> >> the javadoc of that method with these important details.
> > 
> > NOT DONE
> > 
> >> * If we're using Threads, let's get them from a thread pool. 
> >> Can we use the JDK 1.5 concurrency features like Future and 
> >> ThreadPoolExecutor? That will give us more control over how 
> >> the threads are created, and save the effort of creating new 
> >> threads; all important in a container. In JDK 1.4 it's OK if 
> >> the code works single-threaded.
> > 
> > NOT DONE
> > 
> >> * CellRequest.java contains garbage code - please remove.
> > 
> > Done - thanks.
> > 
> >> * CacheMapTest.java - should end with '// End CacheMapTest.java'
> > 
> > NOT DONE
> > 
> > _______________________________________________
> > Mondrian mailing list
> > Mondrian at pentaho.org
> > http://lists.pentaho.org/mailman/listinfo/mondrian
> > 
> 
> 
> -- 
> Quis custodiet ipsos custodes:
> This email message is for the sole use of the intended 
> recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all
> copies of the original message.
> 




More information about the Mondrian mailing list