[Mondrian] Performance: CellInfoPool
Julian Hyde
julianhyde at speakeasy.net
Thu Aug 16 14:19:57 EDT 2007
One of the problems of hashing is choice of hashcode, and boy, we were
choosing a really bad one. To encode a two-dimensional coordinates (x, y),
CellInfoPool was doing (x + y * (2^32-1)). Then to convert that 64 bit long
to a 32 bit int hashcode, it was doing (k ^ (k >>> 32)).
You'll notice that if x and y are n bits long, the hashcode will also be n
bits long. In this case, n=9: 25,000 keys were all generating hashcodes
between 0 and ~512. Ugh.
Here are the numbers on my machine:
* CellInfoMap: 22 ms
* CellInfoPool: 7,041 ms (167,270,680 hash collisions)
I changed the formula to convert a long value into an int hashcode to (k ^
(k >>> 20) ^ (k >>> 40)). The upper 32 bits are now smeared out more evenly,
and the number of hash collisions decreased drastically:
* CellInfoPool: 146 ms (275,417 hash collisions)
I tried to improve the algorithm further by setting the initial size of the
hash table. It didn't help. The number of hash collisions actually increased
a little to 431,741.
Fixed in change 9771, will be in 2.4.1.
Julian
> -----Original Message-----
> From: mondrian-bounces at pentaho.org
> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Anton Nikitin
> Sent: Thursday, August 16, 2007 7:13 AM
> To: 'Mondrian developer mailing list'
> Subject: RE: [Mondrian] Performance: CellInfoPool
>
> I just run very simple code snippet:
>
>
> ==== CODE START ===
> public static void main(String[] args) {
> try {
> CellInfoPool pool = new CellInfoPool(2);
> CellInfoMap map = new CellInfoMap(
> CellKey.Generator.newCellKey(2));
>
> // Testing CellInfoMap
> long millis = System.currentTimeMillis();
> for (int i = 0; i < 500; i++) {
> for (int j = 0; j < 500; j++) {
> map.create(new int[]{i, j});
> }
> }
> System.out.println("CellInfoMap: " +
> (System.currentTimeMillis() - millis) + " ms");
>
> // Testing CellInfoPool
> millis = System.currentTimeMillis();
> for (int i = 0; i < 500; i++) {
> for (int j = 0; j < 500; j++) {
> pool.create(new int[]{i, j});
> }
> }
> System.out.println("CellInfoPool: " +
> (System.currentTimeMillis() - millis) + " ms");
>
> } catch (Throwable x) {
> x.printStackTrace();
> }
> }
> ==== CODE END ===
>
> My result:
>
> CellInfoMap: 63 ms
> CellInfoPool: 22202 ms
>
> I also ran some examples before but couldn't get such results.
> IMHO the performance really depends on keys...
>
> Anton
>
> -----Original Message-----
> From: mondrian-bounces at pentaho.org
> [mailto:mondrian-bounces at pentaho.org] On
> Behalf Of Julian Hyde
> Sent: Tuesday, August 14, 2007 7:40 AM
> To: 'Mondrian developer mailing list'
> Subject: RE: [Mondrian] Performance: CellInfoPool
>
>
> > Peter Tran wrote:
> >
> > ObjectPool performs a lot better than HashSet for the
> String test and
> > comparable for the Integer test.
> >
> > Am I reading this wrong?
>
> You are correct, Peter. My tests show there is no performance bug with
> ObjectPool.
>
> ObjectPool is also superior for this purpose because it uses
> significantly
> less memory.
>
> There still might be a performance bug with CellInfoPool - I
> didn't test
> that.
>
> Julian
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
More information about the Mondrian
mailing list