[Mondrian] RE: List of Members memory usage

Julian Hyde julianhyde at speakeasy.net
Tue Jan 9 17:50:41 EST 2007


xx
 

-----Original Message-----

From: Richard Emberson [ <mailto:remberson at edgedynamics.com>
mailto:remberson at edgedynamics.com] 
Sent: Tuesday, January 09, 2007 6:19 AM
To: Julian Hyde
Subject: Re: List of Members memory usage

Concerning the new Position interface:

interface Position extends List<Member> {
}

we also agreed that it would be an unmodifiable list and that most of its
methods would throw an UnsupportedOperationException. Specifically, the only
methods supported would be:

boolean isEmpty();
int size();
Iterator<Member> iterator();
Member get(int index);

and possibly

ListIterator<Member> listIterator();
ListIterator<Member> listIterator(int index);

where the iterators would be readonly.

Agreed.

Concerning ITERABLE:
1) ITERABLEs are readonly,
2) it should be noted that when a 'sort' is required, then the collection of
Members must be materialized because a sort requires the ability to modify
the order of the Members in the collection.

[With xslt's one can stream the transformation of a document unless one does
a 'sort', in which case, the whole document has to be read into memory at
the same time. There is a similar issue when using LISTs or ITERABLEs when a
sort is encountered.]

Again, agreed. Note that unlike Position, Iterable is for internal use
within mondrian, in implementing MDX expressions. It's OK if its use is
somewhat circumscribed.

Sort is the classic example of a blocking operation. It breaks the iterator
model, but we hope that in practice, there are ways around it, such as
reading from an underlying source which is already sorted.

One important usage of sort is the Rank function. We know that sorting the
sets underlying a rank operator is expensive, so we introduce a 'cached
expression' which ensures that the list is only sorted once. If I recall,
the list stored in the cache is specially indexed so we can compute rank
more efficiently.

3) Iterable is a Java5 interface:

public interface Iterable<T> {
    Iterator<T> iterator();
}

which means that the consumer can not tell how many members it has without
iterating through them.

Yes, it was introduced in 1.5. But retroweaver has its own implementation of
Iterable for 1.4.

It's true that you don't know the number of elements an iterator is going to
return. That is a disadvantage for the consumer (because it has to operate
with less information) but an advantage to the producer (because it has less
information to deliver), so things tend to even out.

We can discuss adding 'added value' versions of iterable if there is
information which is cheap for producers to compute and extremely useful to
consumers. E.g.

interface IterableWithCount<T> extends Iterable<T> {
  int size();
}

If the consumer absolutely has to know the number of elements in the set
(e.g. the Tail function) then it should ask the producer for a list.

Sometimes it is sufficient for the consumer to have an approximate count. If
so ,we can introduce cost metrics into the expression model.

Julian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070109/42678945/attachment.html 


More information about the Mondrian mailing list