[Mondrian] Eliminating unnecessary "group by" clauses in dimension queries
Eric McDermid
mcdermid at stonecreek.com
Tue Apr 14 01:46:35 EDT 2009
We're encountering a situation in which Mondrian generates an
apparently unnecessary GROUP BY clause when querying a dimension
table, resulting in unnecessarily slow performance on MySQL. We're
actually seeing this on Mondrian 2.4, but per inspection the same
pattern appears to exist in the current codebase as well.
Here's a (slightly simplified) example. Assume the query is defined
as follows:
<Dimension foreignKey="customer_dim_id" name="CIs">
<Hierarchy hasAll="true" allMemberName="All CIs"
primaryKey="customer_dim_id">
<Table name="customer_dim"/>
<Level column="provider_id" levelType="Regular" name="Provider"
type="Numeric" uniqueMembers="true"/>
<Level column="provider_payment_id" levelType="Regular"
name="Payment" type="Numeric" uniqueMembers="true">
<Property name="Status" column="provider_payment_status"/>
<Property name="Payment Description"
column="provider_payment_desc" type="String"/>
</Level>
<Level column="provider_group_id" levelType="Regular" name="Group"
type="Numeric" uniqueMembers="true">
<Property name="Group Name" column="provider_group_name"
type="String"/>
</Level>
<Level column="customer_id" levelType="Regular"
name="Creative" type="Numeric" uniqueMembers="true">
<Property name="Status" column="customer_status" type="String"/>
</Level>
</Hierarchy>
</Dimension>
The SQL eventually generated from an MDX query to select from
customer_dim looks like this:
select
`customer_dim`.`provider_id ` as `c0`,
`customer_dim`.`provider_payment_id ` as `c1`,
`customer_dim`.`provider_payment_status ` as `c2`,
`customer_dim`.`provider_payment_desc ` as `c3`,
`customer_dim`.`provider_group_id ` as `c4`,
`customer_dim`.`provider_group_name ` as `c5`,
`customer_dim`.`customer_id ` as `c6`,
`customer_dim`.`customer_status ` as `c7`
from
`customer_dim` as `customer_dim`
group by
`customer_dim`.`provider_id `,
`customer_dim`.`provider_payment_id `,
`customer_dim`.`provider_payment_status `,
`customer_dim`.`provider_payment_desc `,
`customer_dim`.`provider_group_id `,
`customer_dim`.`provider_group_name `,
`customer_dim`.`customer_id `,
`customer_dim`.`customer_status `
order by
`customer_dim`.`provider_id ` ASC,
`customer_dim`.`provider_payment_id ` ASC,
`customer_dim`.`provider_group_id ` ASC,
`customer_dim`.`customer_id ` ASC
Since each item in the SQL select is a simple column reference, the
GROUP BY appears to be unnecessary. Worse, including it can slow
performance of the query on MySQL by more than an order of magnitude.
The relevant code is in SqlQuery.addSelectGroupBy() which is called in
multiple circumstances by SqlTupleReader.addLevelMemberSql(). I'm
still in the process of understanding some test failures related to my
experiments, as well as working out how this might apply to
RolapLevel.parentExp, but wanted to double-check a couple of the more
basic items:
1. Am I correct in thinking the GROUP BY can and should be eliminated
in this case?
2. When determining whether each element of the SELECT is a simple
column reference, is a check that each level's keyexp and properties
are instances of MondrianDef.Column sufficient? Seems like it should
be, but I don't know if there might be other cases I'm missing.
3. Assuming the answer to the previous question is "yes", am I
correct in thinking that if we don't avoid generating a group by for
the key expressions, we should also avoid generating it for the
ordinal and caption expressions?
-- Eric
More information about the Mondrian
mailing list