[Mondrian] High Cardinality for Mondrian
jhyde at pentaho.org
Wed Feb 13 03:27:12 EST 2008
I have spent the past day or so integrating your changes with the latest
mondrian source code. I am not ready to check in yet, for two reasons.
First, there is a serious performance degradation. For instance, the test
suite run against oracle on jdk 1.6 takes 260 seconds before the change, 540
seconds after it.
Second, the code is not in a state to check in. I sent you an email in
January describing the changes I needed you to make to the code to make it
ready for submission, and you have done very few of them. In particular, you
have changed the format of tuples from Member to List<? extends Member>,
something I explicitly requested you did not do.
I know that this is a major change and you and your colleagues have spent a
lot of time working on it, but it is a waste of my time if I give you a
detailed list of issues and you ignore it.
I would like this change to make it into the mondrian code base, so I intend
to carry on working on it to see if I can resolve the performance and other
issues so that I can check it in. However, mondrian 3.0 is due very soon and
I don't think this change will make it in before then.
My email from 2008/1/15, with my comments on what is left to do in the code
drop of 2008/2/8.
> Thanks for the contribution. I'll accept the contribution and
> put it into the release, but there are quite a few issues I'd
> like to resolve before I do that.
> * Please make a pass through your changes again and remove
> any garbage code (e.g. code for debugging) or files you have
> edited but don't intend to check in (e.g. mondrian.properties)
NOT DONE - mondrian.properties is included in highcard.diff
> * The code has moved on a lot since 2.4.2. Can you send me
> the diffs with respect to the current perforce head? That
> would help me a lot. To do this, sync your perforce client to
> 2.4.2 ('p4 sync ... at 9831'), copy in your files, 'p4 edit
> <filename>' for each file you have modified, then get the
> latest ('p4 sync ...'), and resolve the changes ('p4 resolve').
You sent a patch - I appreciate that. But it wasn't based on perforce head.
After about an hour of experiments, I discovered that it was based on
perforce change 10467 (Jan 24th) or thereabouts. You should have mentioned
> * Does the code pass the regression suite? And on what
> DBMS/JDK/Operating system? Have you tried running 'megatest'
> (which sets various properties). The code needs to pass all
> tests before I can release it.
You say that it passes tests on Oracle and MySQL. I believe you. I am seeing
2 failures and 3 errors, but maybe this is due to a merge mistake on my
> * As I said before, I want to keep the tuple representation
> as a member array, not a member list. In particular,
> TupleCalc.evaluateTuple(Evaluator) should return Member not
NOT DONE. This is major change, because it negatively impacts memory and
performance, and does not even contribute to your goal of making
high-cardinality dimensions work better. I asked you twice to do this and
you ignored me. Did you think I wasn't going to notice?!
> * In AbstractXxxCalc, there are a few toString() methods on a
> single line - make multiple lines or eliminate them.
> * Likewise hashCode() in ConcatenableList
> * ConcatenableList needs javadoc and a unit test
> * In CmdRunner.java, remove ex.printStackTrace lines. I don't
> think you intended to check these in.
> * Dimension.isHighCardinality() method needs javadoc
Done - thank you.
> * MondrianProperties.java - property descriptions have
> spelling mistakes, should start with "Integer property which ...".
> * Document properties in configuration.html and
> mondrian.properties. I need a MUCH better description of the
> MaxParallelThreads property: something which would be
> understood by a DBA, not just a mondrian developer. I had to
> delve into the code to find out that this related to loading
> aggregations using GROUPING SETS.
> * There are some files included which I guess you don't want
> to commit, e.g. mondrian.properties,
> demo/foodmart.properties. Please remove these from the patch file.
> * When breaking long lines, please use continuation indent 4 not 8.
> * Make sure lines don't have trailing spaces. These may cause
> spurious diffs later.
NOT DONE - I counted 191 new or modified lines with trailing spaces.
> * Uses of UnsupportedList in HeadTailFunDef: could you use
> AbstractList instead? I think you implement all of the
> required methods.
> * Aggregation.java and other places: code formatting; control
> flow constructs like 'if' and 'while' always use braces and
> span multiple lines.
> * Aggregation.java: you've made the load method asynchronous.
> I'm not convinced that this is safe. In particular, how does
> the caller know when the load has completed? Need to update
> the javadoc of that method with these important details.
> * If we're using Threads, let's get them from a thread pool.
> Can we use the JDK 1.5 concurrency features like Future and
> ThreadPoolExecutor? That will give us more control over how
> the threads are created, and save the effort of creating new
> threads; all important in a container. In JDK 1.4 it's OK if
> the code works single-threaded.
> * CellRequest.java contains garbage code - please remove.
Done - thanks.
> * CacheMapTest.java - should end with '// End CacheMapTest.java'
More information about the Mondrian