[Mondrian] RE: Greenplum Dialect
Julian Hyde
jhyde at pentaho.com
Wed Dec 23 17:55:27 EST 2009
> Attached are the following files necessary to implement the
> Greenplum
> Dialect, which I have implemented by following the Infobright
> pattern
> since Greenplum uses the same JDBC driver as Postgres.
Thanks. I have checked in your changes as 13270. I added a line to
META-INF/services/mondrian.spi.Dialect so the dialect gets loaded
automatically.
> I have
> made the
> following tests:
> 1) Extracted a clean copy of Mondrian from Perforce server
> (trying to
> update an existing version caused the MySQL errors I mentioned
> previously)
> 2) I ran the MySQL tests cleanly to get a benchmark
> 3) I ran the same tests against postgres, where 3 failed
Haven't run tests against postgres in a while. I can believe that there are
3 errors.
> 4) I made the changes
> 5) I successfully ran the MySQL tests again
> 6) I successfully ran the postgres tests again, where 3 failed again
> 7) I ran the tests against greenplum, test success rate is
> not as good
> (17 errors)...some were because greenplum does not support
> correlated
> sub-expressions (uncorrelated are fine).
17 errors doesn't sound TOO bad out of ~2200 tests. Does DialectTest pass?
That's the main thing. And does MondrianFoodMartLoader work OK?
As the final part of making greenplum an 'official' dialect, can you also
give me a few lines to add to mondrian.properties. Every database has some
sample property settings; for instance, this is the section for postgres:
# Postgres: needs user and password
#mondrian.foodmart.jdbcURL=jdbc:postgresql://localhost/FM3
#mondrian.foodmart.jdbcUser=postgres
#mondrian.foodmart.jdbcPassword=pgAdmin
#mondrian.jdbcDrivers=org.postgresql.Driver
> The changes to AbstractSQL.distinctGenerateSql are Greenplum
> specific
> and lead to the X10 performance improvement on count distinct, here
> are two examples:
>
> select count("m0") as "c0" from (select distinct
> "facttable"."username" as "m0" from "cube"."facttable" as
> "facttable")
> as "dummyname"
> --32214794
> --Total query runtime: 2199057 ms.
>
> which is a bit different from optimal query
>
> select count("m0") as "c0" from (select
> "facttable"."username" as "m0"
> from "cube"."facttable" as "facttable" group by username) as
> "dummyname"
> --32214794
> --Total query runtime: 242937 ms.
>
> The changes made to AbstractSQL.distinctGenerateSql are
> conditional on
> the Dialect being Greenplum
I'm a bit surprised how much better greenplum does with 'select x ... Group
by x' than with 'select distinct x ...'. It's well known that these queries
are equivalent -- most databases I know expand 'select distinct' to 'group
by' at an early stage. Can you please report the issue to greenplum? If you
can get a bug number from greenplum we can make a note in the code, and take
out your workaround if/when greenplum fix the issue.
> Hopefully, I'll get a chance tomorrow morning to forward some Excel
> 2007 test cases
That would be a nice Xmas present!
Julian
More information about the Mondrian
mailing list