[Mondrian] Improvements for High Cardinality

Luis F. Canals luis.canals at stratebi.com
Wed Jan 16 05:11:56 EST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, Julian,

I'll answer your points one by one:
    * Ok, you're right, maybe there are many debug informartion at
this moment.
       But other files has been changed, and need to be changed to
pass the tests, for instance FoodMart.xml and mondrian.properties
    * Right! Moving to current version.... it will take a lot, as you
know.
    * Coding is passing all tests included in mondrian-2.4.2-9831
version, against a mysql database, and running in Linux, Windows with
JDK 1.4, 1.5 and 1.6. We think that "test" ant task includes all
tests... but, what do you mean with a "megatest"?.
    * About the returning of TupleCalc.evaluateTuple(Evaluator), we
will try to return a Member[], but as methods are used for multiple
and different purposes, maybe, the need to return a List<Member>
somewhere... ok, let's try it and we will tell you our results.
    * We are absoluty with you: mondriand.properties must be commented
deeper. At this point, we were only trying to give you a "first
release" of the changes. But this is not the deffinitive
modifications, of course.
    * Right, test and javadoc for ConcatenableList, good idea.
    * Inline methods: to avoid very large code, we are use to include
in the same line the definition and body for auxiliary and not
interesting functions (toString, hashCode, equals...) But if you want
them in several lines, there is no problem for us; it's a question of
coding standard, so no problem.
    * highCardinality: yes, javadoc is needed for isHighCardinality,
but more interesting is documenting the attribute for schema
descriptions (how, when and why to use it, etc)
    * Typing mystakes: Ha ha ha ha, of course, we type too fast, and
comments are not compiled, so miskates are not detected.... until
other eyes take a look at it.... fantastic!
    * Ok, more and more, for each of your suggestions. We will answer
with pices of code which solve each of your points. We agree with you
in the 99.9% of your suggestions, so let's start working now!


Uahu! We see that you have taken a lot of interest in these changes.
We are absolutly happy, since we thought none was going to read our
modifications (a lot of tricky changes, at last), so thank you very much.

More news in short. Best regards,

- - jorge, javier & luis


Julian Hyde wrote:
> Luis,
>
> Thanks for the contribution. I'll accept the contribution and put it into
> the release, but there are quite a few issues I'd like to resolve before I
> do that.
>
> * Please make a pass through your changes again and remove any garbage code
> (e.g. code for debugging) or files you have edited but don't intend to
check
> in (e.g. mondrian.properties)
>
> * The code has moved on a lot since 2.4.2. Can you send me the diffs with
> respect to the current perforce head? That would help me a lot. To do this,
> sync your perforce client to 2.4.2 ('p4 sync ... at 9831'), copy in your
files,
> 'p4 edit <filename>' for each file you have modified, then get the latest
> ('p4 sync ...'), and resolve the changes ('p4 resolve').
>
> * Does the code pass the regression suite? And on what DBMS/JDK/Operating
> system? Have you tried running 'megatest' (which sets various properties).
> The code needs to pass all tests before I can release it.
>
> * As I said before, I want to keep the tuple representation as a member
> array, not a member list. In particular, TupleCalc.evaluateTuple(Evaluator)
> should return Member[] not List<Member>.
>
> * In AbstractXxxCalc, there are a few toString() methods on a single line -
> make multiple lines or eliminate them.
>
> * Likewise hashCode() in ConcatenableList
>
> * ConcatenableList needs javadoc and a unit test
>
> * In CmdRunner.java, remove ex.printStackTrace lines. I don't think you
> intended to check these in.
>
> * Dimension.isHighCardinality() method needs javadoc
>
> * MondrianProperties.java - property descriptions have spelling mistakes,
> should start with "Integer property which ...".
>
> * Document properties in configuration.html and mondrian.properties. I need
> a MUCH better description of the MaxParallelThreads property: something
> which would be understood by a DBA, not just a mondrian developer. I had to
> delve into the code to find out that this related to loading aggregations
> using GROUPING SETS.
>
> * There are some files included which I guess you don't want to commit,
e.g.
> mondrian.properties, demo/foodmart.properties. Please remove these from the
> patch file.
>
> * When breaking long lines, please use continuation indent 4 not 8.
>
> * Make sure lines don't have trailing spaces. These may cause spurious
diffs
> later.
>
> * Uses of UnsupportedList in HeadTailFunDef: could you use AbstractList
> instead? I think you implement all of the required methods.
>
> * Aggregation.java and other places: code formatting; control flow
> constructs like 'if' and 'while' always use braces and span multiple lines.
>
> * Aggregation.java: you've made the load method asynchronous. I'm not
> convinced that this is safe. In particular, how does the caller know when
> the load has completed? Need to update the javadoc of that method with
these
> important details.
>
> * If we're using Threads, let's get them from a thread pool. Can we use the
> JDK 1.5 concurrency features like Future and ThreadPoolExecutor? That will
> give us more control over how the threads are created, and save the effort
> of creating new threads; all important in a container. In JDK 1.4 it's
OK if
> the code works single-threaded.
>
> * CellRequest.java contains garbage code - please remove.
>
> * CacheMapTest.java - should end with '// End CacheMapTest.java'
>
>
>
> Julian
>
>> -----Original Message-----
>> From: mondrian-bounces at pentaho.org
>> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Luis F. Canals
>> Sent: Tuesday, January 15, 2008 9:00 AM
>> To: mondrian at pentaho.org
>> Cc: Javier Giménez Aznar; Jorge López Mateo
>> Subject: [Mondrian] Improvements for High Cardinality
>>
>> Hi all,
>>
>> as you may remember, we were working (since november 2007) to give
>> Mondrian capability to deal with high cardinality dimensions.
>> In this time, we have improved load process from database with the
>> capablity of executing several parallel threads to get information as
>> fast as databases are able to.
>>
>> Ooook. Here, we have our first approach to the solution.
>> We have made a lot of changes, so we send attached a patch to
>> be applied
>> to mondrian 2.4.2-9831.
>> Main modifications are:
>>     1.- changed Member[] to List<Member> (there is no memory and
>> performance impact and let us to move towards an iterable list)
>>     2.- add new boolean property to .xml describing schema:
>> highCardinality, for dimensions
>>     3.- add new properties into mondrian.properties to manage high
>> cardinality issues:
>>              mondrian.result.highCardChunkSize to indicate how many
>> cells are read at a the same time when highcardinality dimensions are
>> involved
>>              mondrian.rolap.MaximumParallelThreads to
>> indicate how many
>> parallel threads are allowed to get results from an MDX query
>>
>> There are many improvements on performance (basically because even for
>> low dimensions results needn't to be read and managed as a "big block"
>> causing the allocation and deallocation of big amounts of bytes).
>>
>> Changes to API are described in patch file. (Aply it using:
>>        patch -p0 < diff)
>>
>> Please, we are very interested in including these changes
>> into the next
>> release.
>> If you have any question, send us. (javier.gimenez at stratebi.com,
>> jorge.lopez at stratebi.com, luis.canals at stratebi.com)
>> We have opened questions: for instance, which functions can
>> be used with
>> high cardinality dimensions? (At this point, only "head" and
>> "tail" are
>> fully implemented for high cardinality.) We will be very happy if you
>> find more!
>>
>> Best regards,
>>
>> - javier, jorge & luis
>>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHjdhq6XofO2yaIbARAiblAJ0Z0h7PNqCMefnxw7cbO0nPn4/MpgCfdO6Q
HlAQLVdj4iWpxY9Sww9LUgM=
=cRgq
-----END PGP SIGNATURE-----




More information about the Mondrian mailing list