[Mondrian] High Cardinality for Mondrian

Richard Emberson remberson at edgedynamics.com
Wed Feb 13 09:59:17 EST 2008


Julian,

As long as I worked on Mondrian I always wondered why some of the
API returned arrays rather than lists (Member[] rather than
List<? extends Member>). I would submit that arrays are
less flexible, more expensive to create and no faster to use.

What are the advantages of returning arrays?

I granted that there are a number of clients which would have to
migrate to any new API and backward compatibility could the
the sole reason.

Thanks.

Richard


Julian Hyde wrote:
> Luis,
> 
> I have spent the past day or so integrating your changes with the latest
> mondrian source code. I am not ready to check in yet, for two reasons.
> 
> First, there is a serious performance degradation. For instance, the test
> suite run against oracle on jdk 1.6 takes 260 seconds before the change, 540
> seconds after it.
> 
> Second, the code is not in a state to check in. I sent you an email in
> January describing the changes I needed you to make to the code to make it
> ready for submission, and you have done very few of them. In particular, you
> have changed the format of tuples from Member[] to List<? extends Member>,
> something I explicitly requested you did not do.
> 
> I know that this is a major change and you and your colleagues have spent a
> lot of time working on it, but it is a waste of my time if I give you a
> detailed list of issues and you ignore it.
> 
> I would like this change to make it into the mondrian code base, so I intend
> to carry on working on it to see if I can resolve the performance and other
> issues so that I can check it in. However, mondrian 3.0 is due very soon and
> I don't think this change will make it in before then.
> 
> Julian
> 
> 
> ----
> 
> My email from 2008/1/15, with my comments on what is left to do in the code
> drop of 2008/2/8.
> 
>> Thanks for the contribution. I'll accept the contribution and 
>> put it into the release, but there are quite a few issues I'd 
>> like to resolve before I do that.
>>
>> * Please make a pass through your changes again and remove 
>> any garbage code (e.g. code for debugging) or files you have 
>> edited but don't intend to check in (e.g. mondrian.properties)
> 
> NOT DONE - mondrian.properties is included in highcard.diff
> 
>> * The code has moved on a lot since 2.4.2. Can you send me 
>> the diffs with respect to the current perforce head? That 
>> would help me a lot. To do this, sync your perforce client to 
>> 2.4.2 ('p4 sync ... at 9831'), copy in your files, 'p4 edit 
>> <filename>' for each file you have modified, then get the 
>> latest ('p4 sync ...'), and resolve the changes ('p4 resolve').
> 
> You sent a patch - I appreciate that. But it wasn't based on perforce head.
> After about an hour of experiments, I discovered that it was based on
> perforce change 10467 (Jan 24th) or thereabouts. You should have mentioned
> this.
> 
>> * Does the code pass the regression suite? And on what 
>> DBMS/JDK/Operating system? Have you tried running 'megatest' 
>> (which sets various properties). The code needs to pass all 
>> tests before I can release it.
> 
> You say that it passes tests on Oracle and MySQL. I believe you. I am seeing
> 2 failures and 3 errors, but maybe this is due to a merge mistake on my
> part.
> 
>> * As I said before, I want to keep the tuple representation 
>> as a member array, not a member list. In particular, 
>> TupleCalc.evaluateTuple(Evaluator) should return Member[] not 
>> List<Member>.
> 
> NOT DONE. This is major change, because it negatively impacts memory and
> performance, and does not even contribute to your goal of making
> high-cardinality dimensions work better. I asked you twice to do this and
> you ignored me. Did you think I wasn't going to notice?!
> 
>> * In AbstractXxxCalc, there are a few toString() methods on a 
>> single line - make multiple lines or eliminate them.
> 
> NOT DONE
> 
>> * Likewise hashCode() in ConcatenableList
> 
> NOT DONE
> 
>> * ConcatenableList needs javadoc and a unit test
> 
> NOT DONE
> 
>> * In CmdRunner.java, remove ex.printStackTrace lines. I don't 
>> think you intended to check these in.
> 
> NOT DONE
> 
>> * Dimension.isHighCardinality() method needs javadoc
> 
> Done - thank you.
> 
>> * MondrianProperties.java - property descriptions have 
>> spelling mistakes, should start with "Integer property which ...".
> 
> NOT DONE
> 
>> * Document properties in configuration.html and 
>> mondrian.properties. I need a MUCH better description of the 
>> MaxParallelThreads property: something which would be 
>> understood by a DBA, not just a mondrian developer. I had to 
>> delve into the code to find out that this related to loading 
>> aggregations using GROUPING SETS.
> 
> NOT DONE
> 
>> * There are some files included which I guess you don't want 
>> to commit, e.g. mondrian.properties, 
>> demo/foodmart.properties. Please remove these from the patch file.
>>
>> * When breaking long lines, please use continuation indent 4 not 8.
> 
> NOT DONE
> 
>> * Make sure lines don't have trailing spaces. These may cause 
>> spurious diffs later.
> 
> NOT DONE - I counted 191 new or modified lines with trailing spaces.
> 
>> * Uses of UnsupportedList in HeadTailFunDef: could you use 
>> AbstractList instead? I think you implement all of the 
>> required methods.
> 
> NOT DONE
> 
>> * Aggregation.java and other places: code formatting; control 
>> flow constructs like 'if' and 'while' always use braces and 
>> span multiple lines.
> 
> NOT DONE
> 
>> * Aggregation.java: you've made the load method asynchronous. 
>> I'm not convinced that this is safe. In particular, how does 
>> the caller know when the load has completed? Need to update 
>> the javadoc of that method with these important details.
> 
> NOT DONE
> 
>> * If we're using Threads, let's get them from a thread pool. 
>> Can we use the JDK 1.5 concurrency features like Future and 
>> ThreadPoolExecutor? That will give us more control over how 
>> the threads are created, and save the effort of creating new 
>> threads; all important in a container. In JDK 1.4 it's OK if 
>> the code works single-threaded.
> 
> NOT DONE
> 
>> * CellRequest.java contains garbage code - please remove.
> 
> Done - thanks.
> 
>> * CacheMapTest.java - should end with '// End CacheMapTest.java'
> 
> NOT DONE
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
> 


-- 
Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.



More information about the Mondrian mailing list