[Mondrian] Proposed additions to Mondrian schema syntax

Julian Hyde jhyde at pentaho.com
Fri Jun 19 13:39:41 EDT 2009


This is a good direction for mondrian to be going in. More knowledge of the
structure + constraints of the relational schema allows use to generate more
efficient SQL.

Mondrian is not responsible for ensuring that these constraints are valid.
That is the responsibility of whoever is loading the database. Can you add
that to the doc in schema.html. (We might consider adding design-time tools
to validate the database, similar to some checks that the agg table designer
does now, but there will definitely not be runtime checks.)

We might also add tools for deducing those constraints from the schema
(database constraints, such as primary and foreign keys, and also the data
patterns that exist in the schema.

I am reworking the mondrian schema now, and expressing functional
dependencies was one of the goals of the new schema format. The schema
changes have taken several months, and will take one or two more, so you
shouldn't wait for my changes. Go ahead and check in, and at very least, we
will be able to recycle your use cases & test cases.

One proviso. I intend to maintain backwards compatibility for schema files
in the current format, but these constructs will be an exception to that. If
anyone wants to use functional dependencies, they will have to convert their
schema to the new format.

Obviously, you must document these schema extensions in schema.html. Please
mention that these extensions will probably not be supported in
mondrian-4.0, but we will provide another means for declaring functional
dependencies. Provide use cases in the doc, and I will make sure that these
use cases are solved in the new schema format.

Julian

> -----Original Message-----
> From: mondrian-bounces at pentaho.org 
> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Eric McDermid
> Sent: Thursday, June 18, 2009 9:29 AM
> To: Mondrian developer mailing list
> Subject: [Mondrian] Proposed additions to Mondrian schema syntax
> 
> In investigating some performance issues with Mondrian 
> running against  
> MySQL, I've noticed that GROUP BY performance appears to be directly  
> related to the number of columns in the GROUP BY (for MySQL, 
> anyway).   
> Thus, the smaller the set of columns on which we group, the 
> better the  
> performance.
> 
> The obvious optimization case, as Julian has previously pointed out,  
> is that if any of the levels being queried are based on the primary  
> key, the GROUP BY can be eliminated entirely.
> 
> There are other cases that can be optimized as well.  However, these  
> require information which Mondrian cannot infer absent additional  
> schema information.
> 
> The first is the case where some or all of a level's properties are  
> functionally dependent on the level itself.  In other words, for a  
> given value of the level's column, the values associated with the  
> properties are known to be invariant.  In such a case, 
> grouping on the  
> property column is logically redundant and may safely be 
> eliminated if  
> the database permits queries to SELECT columns not in the GROUP BY.   
> MySQL allows this by default, though support for it can be 
> turned off  
> if desired.
> 
> The second case is the one in which all properties are functionally  
> dependent on their levels, and there is some set of level columns  
> which, taken together, uniquely identify every row in the 
> table (i.e.  
> they effectively comprise an alternate unique key). In such 
> cases, we  
> can eliminate the GROUP BY entirely if all of the levels in 
> the "key"  
> are present.
> 
> I'd like to propose two new optional attributes to the schema  
> definition to support this:
> 
> 1. New Property attribute: dependsOnLevelValue
> 
> If set, indicates that the property is functionally dependent on the  
> level with which it is associated.  This identifies it as a 
> candidate  
> for removal from the GROUP BY on databases which allow it.
> 
> 2. New Hierarchy attribute: uniqueKeyLevelName
> 
> If set, indicates that all properties in the hierarchy are  
> functionally dependent on their levels, and that the set of levels  
> from the root to the named level (inclusive) defines an alternate  
> unique key.  This allows the GROUP BY to be eliminated entirely for  
> queries at this depth and deeper.  If uniqueKeyLevelName is set,  
> there's no need to also set dependsOnLevelValue for the individual  
> properties.
> 
> Any objections to adding these as optional attributes in the schema  
> definition?  We've had this running in production for a while 
> (albeit  
> on a Mondrian 2.4 base), and are seeing good results.
> 
> Figuring out a unit test for these may take some head-scratching.   
> Neither Foodmart nor SteelWheels appear to be good fits.
> 
>   -- Eric
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
> 
> 
> 





More information about the Mondrian mailing list