[Mondrian] Performance: CellInfoPool

Julian Hyde julianhyde at speakeasy.net
Thu Aug 16 14:19:57 EDT 2007


One of the problems of hashing is choice of hashcode, and boy, we were
choosing a really bad one. To encode a two-dimensional coordinates (x, y),
CellInfoPool was doing (x + y * (2^32-1)). Then to convert that 64 bit long
to a 32 bit int hashcode, it was doing (k ^ (k >>> 32)).

You'll notice that if x and y are n bits long, the hashcode will also be n
bits long. In this case, n=9: 25,000 keys were all generating hashcodes
between 0 and ~512. Ugh.

Here are the numbers on my machine:

 * CellInfoMap: 22 ms
 * CellInfoPool: 7,041 ms (167,270,680 hash collisions)

I changed the formula to convert a long value into an int hashcode to (k ^
(k >>> 20) ^ (k >>> 40)). The upper 32 bits are now smeared out more evenly,
and the number of hash collisions decreased drastically:

 * CellInfoPool: 146 ms (275,417 hash collisions)

I tried to improve the algorithm further by setting the initial size of the
hash table. It didn't help. The number of hash collisions actually increased
a little to 431,741.

Fixed in change 9771, will be in 2.4.1.

Julian

> -----Original Message-----
> From: mondrian-bounces at pentaho.org 
> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Anton Nikitin
> Sent: Thursday, August 16, 2007 7:13 AM
> To: 'Mondrian developer mailing list'
> Subject: RE: [Mondrian] Performance: CellInfoPool
> 
> I just run very simple code snippet:
> 
> 
> ==== CODE START ===
>     public static void main(String[] args) {
>         try {
>             CellInfoPool pool = new CellInfoPool(2);
>             CellInfoMap map = new CellInfoMap(
>                     CellKey.Generator.newCellKey(2));
> 
>             // Testing CellInfoMap
>             long millis = System.currentTimeMillis();
>             for (int i = 0; i < 500; i++) {
>                 for (int j = 0; j < 500; j++) {
>                     map.create(new int[]{i, j});
>                 }
>             }
>             System.out.println("CellInfoMap: " +
>                     (System.currentTimeMillis() - millis) + " ms");
> 
>             // Testing CellInfoPool
>             millis = System.currentTimeMillis();
>             for (int i = 0; i < 500; i++) {
>                 for (int j = 0; j < 500; j++) {
>                     pool.create(new int[]{i, j});
>                 }
>             }
>             System.out.println("CellInfoPool: " +
>                     (System.currentTimeMillis() - millis) + " ms");
> 
>         } catch (Throwable x) {
>             x.printStackTrace();
>         }
>     }
> ==== CODE END ===
> 
> My result:
> 
> CellInfoMap: 63 ms
> CellInfoPool: 22202 ms
> 
> I also ran some examples before but couldn't get such results.
> IMHO the performance really depends on keys...
> 
> Anton
> 
> -----Original Message-----
> From: mondrian-bounces at pentaho.org 
> [mailto:mondrian-bounces at pentaho.org] On
> Behalf Of Julian Hyde
> Sent: Tuesday, August 14, 2007 7:40 AM
> To: 'Mondrian developer mailing list'
> Subject: RE: [Mondrian] Performance: CellInfoPool
> 
>  
> > Peter Tran wrote:
> >
> > ObjectPool performs a lot better than HashSet for the 
> String test and
> > comparable for the Integer test.
> > 
> > Am I reading this wrong?
> 
> You are correct, Peter. My tests show there is no performance bug with
> ObjectPool.
> 
> ObjectPool is also superior for this purpose because it uses 
> significantly
> less memory.
> 
> There still might be a performance bug with CellInfoPool - I 
> didn't test
> that.
> 
> Julian
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
> 




More information about the Mondrian mailing list