[Mondrian] RE: List of Members memory usage

Julian Hyde julianhyde at speakeasy.net
Mon Jan 8 22:57:50 EST 2007

We discussed this change over the phone this afternoon, but I wanted to
write up what we decided.
There's no question that mondrian has problems dealing with large datasets.
The problem requires a number of solutions, some tactical and some
strategic. Solutions include: more efficient structures for representing
lists (such as your MemberList); use of iterators, both in the
implementation and mondrian's public API; and paging to disk. These
solutions are not orthogonal; in particular, iterators tend to make
disk-based algorithms more efficient, because they encourage locality of
<http://en.wikipedia.org/wiki/Locality_of_reference> reference.
Also, I claim that the use case of a large result set is less common in
practice than the use case of a large intermediate set - for the simple
reason that a result set needs to be read and comprehended by a human being.
1. mondrian.olap.Result
We decided to change the mondrian.olap.Result API. Currently, Result
contains a number of Axis objects, each of which has a member 'Position[]
positions', and each Position has a member 'Member[] members'. We decided to
make both Axis and Position interfaces:
interface Result {
  Axes[] getAxes();
interface Axis {
  List<Position> getPositions();
interface Position extends List<Member> {
This will give us more flexibility in how results are implemented. In
particular, if an axis is huge, say 1M positions each with 3 members, we
won't necessarily have to allocate ~3M elements of space. We will also be
able to use an implementation which returns an iterator, and make it appear
as a list from the client's perspective.
We will examine large axes again in the olap4j specification. I suspect that
olap4j will allow the client to choose between list-based access,
unidirectional cursors and bidirectional cursors.
2. Iterators
Mondrian's expression evaluation framework represents MDX sets as
java.util.List objects: List<Member> for sets of members, List<Member[]> for
sets of tuples.
Richard pointed out that the result of CrossJoin( <set of 1000 products>,
<set of 400 customers> ) requires ~800K (= 1000 x 400 x 2) units of memory,
an suggested an innovative custom implementation of List could just store
the original lists, and use ~1.4K (= 1000 + 400) units of memory.
Julian pointed out that an iterator implementation of CrossJoin would be
better: it would require ~1K (= max(1000, 400)) units of memory in the worst
If the result of the CrossJoin is the ultimate result, then CrossJoinList is
more efficient. But CrossJoinIter allows us to start introducing
iterator-based algorithms, incrementally, throughout mondrian. 
ExpCompiler has an enum ResultStyle { ANY, MUTABLE_LIST, LIST, VALUE }.
Let's add a new value ITERABLE, and a new ExpCompiler method
    IterCalc compileIter(Exp exp)
This method will implement an expression as an Iterable if possible,
otherwise implement it as a List an return an iterator over that list. In
other words, the consumer is calling the shots, and the producer does its
best to comply. If the client wants to say 'what works best for you?', it
should call ExpCompiler.compile(Exp,ResultStyle[]) and give a ranked list of
Also, let's make Axis.getPositions() return different implementations of
List depending on whether the result is (a) a list, or (b) an Iterable.

> -----Original Message-----
> From: Richard Emberson [mailto:remberson at edgedynamics.com]
> Sent: Friday, January 05, 2007 12:44 PM
> To: Julian Hyde
> Subject: List of Members memory usage
> Julian,
> I was looking into memory use and the aggregate cache and
> got side tracked.
> Last Summer we had a query that for our medium-size customer test
> database would always run out of memory - it had a large crossjoin.
> So, I started looking at how one might return lists of Members
> and lists of arrays of Members using less memory.
> Attached is a collection of files (please read the README file)
> that demonstrates a possible approach. I used just such an approach
> and the size of data structure returned by the crossjoin function
> went from 712072200 bytes to 401824 bytes.
> The actual class I use to test this idea in the Mondrian code derives
> from List<Member[]> and I only changed the CrossJoinFunDef,
> RolapResult and Position classes (and all classes that
> directly accessed
> a Position's instance variables). To implement it across the
> board would
> require changing the type returned by the evaluateList method.
> That said, the modified Mondrian code passes all junits except those
> where code is involved that assumes that the Member array returned
> by the list of Members for a given index is always the same array;
> such code uses the Member array as a Map key and with the new
> MemoryList
> classes, it may not be the case that two calls for the same
> list of Members
> element returns the same object.
> Whats nice about having the list of Members, Position and,
> possibly, the
> array of Positions created in RolapResult be based upon
> interfaces is that
> one can adjust which implementation to use base upon available memory
> (and in java 5 one can use memory notifications to change the
> implementation as it is being used).
> I certainly was not going to make such a sweeping change without
> a little discussion and more testing.  Thoughts?
> Richard
> --
> This email message is for the sole use of the intended
> recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all
> copies of the original message.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070108/579bccd3/attachment.html 

More information about the Mondrian mailing list