[Mondrian] RE: Eigenbase perforce change 12887 for review

Julian Hyde jhyde at pentaho.com
Fri Jun 19 17:12:33 EDT 2009


 

> -----Original Message-----
> From: Eric McDermid [mailto:mcdermid at stonecreek.com] 
> Sent: Friday, June 19, 2009 1:21 PM
> To: Mondrian developer mailing list
> Cc: jhyde at pentaho.com
> Subject: Re: [Mondrian] RE: Eigenbase perforce change 12887 for review
> 
> On a similar note, there are a couple of changes in the code I'm  
> porting up that significantly reduce memory usage (again, 
> this is code  
> I didn't write, so I can't give precise characterizations at the  
> moment).
> 
> 1) Substituting of Flat5Map (a simple variation of Apache commons'  
> Flat3Map) for HashMap in to map property names to values in  
> RolapMember (mapPropertyNameToValue).  I don't have exact 
> numbers, but  
> my understanding is that this dramatically improves the memory  
> footprint at the cost of a slight performance slowdown where 
> there are  
> < 6 properties.  (If the performance impact is a concern, 
> it's easy to  
> make configurable.)
> 
> I'm not sure about packaging/licenses on this one -- since 
> Flat5Map is  
> a derivative of Apache's Flat3Map, I assume it must be 
> delivered under  
> the Apache license.  I don't think you can simply repackage and  
> distribute Apache code under the EPL, so my guess is I'd need 
> to find  
> a place to put it under the Apache license, package it in a separate  
> jar and then reference it in?
> 
> If that's too much hassle, it might be easier to just change 
> Mondrian  
> to use Flat3Map, and continue to use Flat5Map just in my client's  
> private version of the code.

Let's do the last option. Change mondrian to use Flat3Map. Allocate the map
using a factory, so your client can use Flat5Map instead.

I wrote ArrayMap in olap4j -- it's very memory efficient, but O(n) for
reads/writes -- and I would like to try that also. Most Mondrian apps make
very little use of properties, just want to display them, so a
space-efficient implementation makes a lot of sense.

> 
> 2) Use an Apache commons LRUMap in SqlMemberSource to allow  
> makeMember(...) to reuse property value objects rather than store  
> multiple references to identical ones.  In other words, if there are  
> 10000 members all of which have the same 3 property values 
> strings, we  
> want memory allocated just for the 3 value strings, rather than 3 *  
> 10000 of them.
> 
> I'm a little wary of the latter optimization, since I'm not sure if  
> there are instances where a property value can be something 
> other than  
> a String/immutable object.  Anyone know definitively one way or the  
> other?

Again, create the map using a factory. (A new method on the same factory
object as above.)

You can assume that the values are immutable. They come from SQL columns via
a JDBC driver. Some objects are technically mutable - e.g. some drivers
return BigDecimal and BigInteger values - but are never changed in practice.

BigDecimal and BigInteger take up more space than I would like, so I would
like to create a map that converted them to a more efficient form. The
factory would let me do that.

I would also like to experiment with using a global map, shared between all
SqlMemberSource objects. Of course, it would be a challenge to make it
thread-safe and  efficient enough. The factorized map will let me do that
experimentation.

Do you mean properties in the broad or the narrow sense? E.g. a member of
the [Store Name] level has attributes key, name, caption, and a few others,
and properties such as [Meat Sqft] and [Has coffee bar]. The attributes are
not technically properties but can benefit from value-pooling. We already do
value-pooling for key values; you should use the same mechanism for values
of other attributes and properties, if possible.

Not sure that LRUMap is perfect for your purposes. If you, say, create an
LRUMap of size 100 and there happen to be 200 distinct values you'll be
doing a lot of work for almost no return. Probably you should just stop
using a map altogether if you discover that there are fewer than N uses of
each value.

Finally, you've raised quite a few issues today. They are all good ideas. If
you decide NOT to go forward with any of the issues, can you please raise a
jira case for that issue so we can track it.

Julian





More information about the Mondrian mailing list