[Mondrian] Proposed additions to Mondrian schema syntax
mcdermid at stonecreek.com
Thu Jun 18 17:55:59 EDT 2009
D'oh! Just realized that I pasted part of this proposal from an older
spec, and doesn't match the one I actually implemented.
The specific difference is that uniqueKeyLevelName does *not* imply
that all properties are dependent on their levels; and so remains
independent from the dependsOnLevelValue attribute of Property. The
group by is only eliminated unless uniqueKeyLevelName exists, the
hierarchy is queried at that depth, and all the properties in the
query have dependsOnLevelValue set to true.
My reasoning for keeping them separate is to avoid accidental
introduction of bugs into the schema definition. If
uniqueKeyLevelName implies that properties are level-dependent, then
adding a nonconforming property causes the query to return bad
results, creating a bug that may not be immediately obvious. If the
attributes are entirely separate, then the new property merely
prevents the GROUP BY optimization -- the query still produces the
correct results, just more slowly.
That's assuming, of course, that the new property isn't added with
"dependsOnLevelValue=true" itself, but that's not something code can
On Jun 18, 2009, at 10:29 AM, Eric McDermid wrote:
> In investigating some performance issues with Mondrian running
> against MySQL, I've noticed that GROUP BY performance appears to be
> directly related to the number of columns in the GROUP BY (for
> MySQL, anyway). Thus, the smaller the set of columns on which we
> group, the better the performance.
> The obvious optimization case, as Julian has previously pointed out,
> is that if any of the levels being queried are based on the primary
> key, the GROUP BY can be eliminated entirely.
> There are other cases that can be optimized as well. However, these
> require information which Mondrian cannot infer absent additional
> schema information.
> The first is the case where some or all of a level's properties are
> functionally dependent on the level itself. In other words, for a
> given value of the level's column, the values associated with the
> properties are known to be invariant. In such a case, grouping on
> the property column is logically redundant and may safely be
> eliminated if the database permits queries to SELECT columns not in
> the GROUP BY. MySQL allows this by default, though support for it
> can be turned off if desired.
> The second case is the one in which all properties are functionally
> dependent on their levels, and there is some set of level columns
> which, taken together, uniquely identify every row in the table
> (i.e. they effectively comprise an alternate unique key). In such
> cases, we can eliminate the GROUP BY entirely if all of the levels
> in the "key" are present.
> I'd like to propose two new optional attributes to the schema
> definition to support this:
> 1. New Property attribute: dependsOnLevelValue
> If set, indicates that the property is functionally dependent on the
> level with which it is associated. This identifies it as a
> candidate for removal from the GROUP BY on databases which allow it.
> 2. New Hierarchy attribute: uniqueKeyLevelName
> If set, indicates that all properties in the hierarchy are
> functionally dependent on their levels, and that the set of levels
> from the root to the named level (inclusive) defines an alternate
> unique key. This allows the GROUP BY to be eliminated entirely for
> queries at this depth and deeper. If uniqueKeyLevelName is set,
> there's no need to also set dependsOnLevelValue for the individual
> Any objections to adding these as optional attributes in the schema
> definition? We've had this running in production for a while
> (albeit on a Mondrian 2.4 base), and are seeing good results.
> Figuring out a unit test for these may take some head-scratching.
> Neither Foodmart nor SteelWheels appear to be good fits.
> -- Eric
> Mondrian mailing list
> Mondrian at pentaho.org
More information about the Mondrian