[Mondrian] RE: Greenplum Dialect

Julian Hyde jhyde at pentaho.com
Wed Dec 23 17:55:27 EST 2009


> Attached are the following files necessary to implement the 
> Greenplum  
> Dialect, which I have implemented by following the Infobright 
> pattern  
> since Greenplum uses the same JDBC driver as Postgres.

Thanks. I have checked in your changes as 13270. I added a line to
META-INF/services/mondrian.spi.Dialect so the dialect gets loaded
automatically.

> I have 
> made the  
> following tests:
> 1) Extracted a clean copy of Mondrian from Perforce server 
> (trying to  
> update an existing version caused the MySQL errors I mentioned  
> previously)
> 2) I ran the MySQL tests cleanly to get a benchmark
> 3) I ran the same tests against postgres, where 3 failed

Haven't run tests against postgres in a while. I can believe that there are
3 errors.

> 4) I made the changes
> 5) I successfully ran the MySQL tests again
> 6) I successfully ran the postgres tests again, where 3 failed again
> 7) I ran the tests against greenplum, test success rate is 
> not as good  
> (17 errors)...some were because greenplum does not support 
> correlated  
> sub-expressions (uncorrelated are fine).

17 errors doesn't sound TOO bad out of ~2200 tests. Does DialectTest pass?
That's the main thing. And does MondrianFoodMartLoader work OK?

As the final part of making greenplum an 'official' dialect, can you also
give me a few lines to add to mondrian.properties. Every database has some
sample property settings; for instance, this is the section for postgres:

# Postgres: needs user and password
#mondrian.foodmart.jdbcURL=jdbc:postgresql://localhost/FM3
#mondrian.foodmart.jdbcUser=postgres
#mondrian.foodmart.jdbcPassword=pgAdmin
#mondrian.jdbcDrivers=org.postgresql.Driver

> The changes to AbstractSQL.distinctGenerateSql are Greenplum 
> specific  
> and lead to the X10 performance improvement on count distinct, here  
> are two examples:
> 
> select count("m0") as "c0" from (select distinct  
> "facttable"."username" as "m0" from "cube"."facttable" as 
> "facttable")  
> as "dummyname"
> --32214794
> --Total query runtime: 2199057 ms.
> 
> which is a bit different from optimal query
> 
> select count("m0") as "c0" from (select 
> "facttable"."username" as "m0"  
> from "cube"."facttable" as "facttable" group by username) as 
> "dummyname"
> --32214794
> --Total query runtime: 242937 ms.
> 
> The changes made to AbstractSQL.distinctGenerateSql are 
> conditional on  
> the Dialect being Greenplum

I'm a bit surprised how much better greenplum does with 'select x ... Group
by x' than with 'select distinct x ...'. It's well known that these queries
are equivalent -- most databases I know expand 'select distinct' to 'group
by' at an early stage. Can you please report the issue to greenplum? If you
can get a bug number from greenplum we can make a note in the code, and take
out your workaround if/when greenplum fix the issue.

> Hopefully, I'll get a chance tomorrow morning to forward some Excel  
> 2007 test cases

That would be a nice Xmas present!

Julian




More information about the Mondrian mailing list