[Mondrian] Proposed additions to Mondrian schema syntax
Eric McDermid
McDermid at stonecreek.com
Thu Jun 18 12:29:04 EDT 2009
In investigating some performance issues with Mondrian running against
MySQL, I've noticed that GROUP BY performance appears to be directly
related to the number of columns in the GROUP BY (for MySQL, anyway).
Thus, the smaller the set of columns on which we group, the better the
performance.
The obvious optimization case, as Julian has previously pointed out,
is that if any of the levels being queried are based on the primary
key, the GROUP BY can be eliminated entirely.
There are other cases that can be optimized as well. However, these
require information which Mondrian cannot infer absent additional
schema information.
The first is the case where some or all of a level's properties are
functionally dependent on the level itself. In other words, for a
given value of the level's column, the values associated with the
properties are known to be invariant. In such a case, grouping on the
property column is logically redundant and may safely be eliminated if
the database permits queries to SELECT columns not in the GROUP BY.
MySQL allows this by default, though support for it can be turned off
if desired.
The second case is the one in which all properties are functionally
dependent on their levels, and there is some set of level columns
which, taken together, uniquely identify every row in the table (i.e.
they effectively comprise an alternate unique key). In such cases, we
can eliminate the GROUP BY entirely if all of the levels in the "key"
are present.
I'd like to propose two new optional attributes to the schema
definition to support this:
1. New Property attribute: dependsOnLevelValue
If set, indicates that the property is functionally dependent on the
level with which it is associated. This identifies it as a candidate
for removal from the GROUP BY on databases which allow it.
2. New Hierarchy attribute: uniqueKeyLevelName
If set, indicates that all properties in the hierarchy are
functionally dependent on their levels, and that the set of levels
from the root to the named level (inclusive) defines an alternate
unique key. This allows the GROUP BY to be eliminated entirely for
queries at this depth and deeper. If uniqueKeyLevelName is set,
there's no need to also set dependsOnLevelValue for the individual
properties.
Any objections to adding these as optional attributes in the schema
definition? We've had this running in production for a while (albeit
on a Mondrian 2.4 base), and are seeing good results.
Figuring out a unit test for these may take some head-scratching.
Neither Foodmart nor SteelWheels appear to be good fits.
-- Eric
More information about the Mondrian
mailing list