[Mondrian] Proposed additions to Mondrian schema syntax

Thu Jun 18 12:29:04 EDT 2009

In investigating some performance issues with Mondrian running against  
MySQL, I've noticed that GROUP BY performance appears to be directly  
related to the number of columns in the GROUP BY (for MySQL, anyway).   
Thus, the smaller the set of columns on which we group, the better the  
performance.

The obvious optimization case, as Julian has previously pointed out,  
is that if any of the levels being queried are based on the primary  
key, the GROUP BY can be eliminated entirely.

There are other cases that can be optimized as well.  However, these  
require information which Mondrian cannot infer absent additional  
schema information.

The first is the case where some or all of a level's properties are  
functionally dependent on the level itself.  In other words, for a  
given value of the level's column, the values associated with the  
properties are known to be invariant.  In such a case, grouping on the  
property column is logically redundant and may safely be eliminated if  
the database permits queries to SELECT columns not in the GROUP BY.   
MySQL allows this by default, though support for it can be turned off  
if desired.

The second case is the one in which all properties are functionally  
dependent on their levels, and there is some set of level columns  
which, taken together, uniquely identify every row in the table (i.e.  
they effectively comprise an alternate unique key). In such cases, we  
can eliminate the GROUP BY entirely if all of the levels in the "key"  
are present.

I'd like to propose two new optional attributes to the schema  
definition to support this:

1. New Property attribute: dependsOnLevelValue

If set, indicates that the property is functionally dependent on the  
level with which it is associated.  This identifies it as a candidate  
for removal from the GROUP BY on databases which allow it.

2. New Hierarchy attribute: uniqueKeyLevelName

If set, indicates that all properties in the hierarchy are  
functionally dependent on their levels, and that the set of levels  
from the root to the named level (inclusive) defines an alternate  
unique key.  This allows the GROUP BY to be eliminated entirely for  
queries at this depth and deeper.  If uniqueKeyLevelName is set,  
there's no need to also set dependsOnLevelValue for the individual  
properties.

Any objections to adding these as optional attributes in the schema  
definition?  We've had this running in production for a while (albeit  
on a Mondrian 2.4 base), and are seeing good results.

Figuring out a unit test for these may take some head-scratching.   
Neither Foodmart nor SteelWheels appear to be good fits.

  -- Eric