[Mondrian] Proposed additions to Mondrian schema syntax

Eric McDermid mcdermid at stonecreek.com
Fri Jun 19 13:48:37 EDT 2009


Will do.

Any suggestions on how to create unit tests for these?  I'm wondering  
if I need to go so far as to add a new data set comparable to FoodMart  
and SteelWheels, since neither of those fit the constraints.

  -- Eric

On Jun 19, 2009, at 11:39 AM, Julian Hyde wrote:

> This is a good direction for mondrian to be going in. More knowledge  
> of the
> structure + constraints of the relational schema allows use to  
> generate more
> efficient SQL.
>
> Mondrian is not responsible for ensuring that these constraints are  
> valid.
> That is the responsibility of whoever is loading the database. Can  
> you add
> that to the doc in schema.html. (We might consider adding design- 
> time tools
> to validate the database, similar to some checks that the agg table  
> designer
> does now, but there will definitely not be runtime checks.)
>
> We might also add tools for deducing those constraints from the schema
> (database constraints, such as primary and foreign keys, and also  
> the data
> patterns that exist in the schema.
>
> I am reworking the mondrian schema now, and expressing functional
> dependencies was one of the goals of the new schema format. The schema
> changes have taken several months, and will take one or two more, so  
> you
> shouldn't wait for my changes. Go ahead and check in, and at very  
> least, we
> will be able to recycle your use cases & test cases.
>
> One proviso. I intend to maintain backwards compatibility for schema  
> files
> in the current format, but these constructs will be an exception to  
> that. If
> anyone wants to use functional dependencies, they will have to  
> convert their
> schema to the new format.
>
> Obviously, you must document these schema extensions in schema.html.  
> Please
> mention that these extensions will probably not be supported in
> mondrian-4.0, but we will provide another means for declaring  
> functional
> dependencies. Provide use cases in the doc, and I will make sure  
> that these
> use cases are solved in the new schema format.
>
> Julian
>
>> -----Original Message-----
>> From: mondrian-bounces at pentaho.org
>> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Eric McDermid
>> Sent: Thursday, June 18, 2009 9:29 AM
>> To: Mondrian developer mailing list
>> Subject: [Mondrian] Proposed additions to Mondrian schema syntax
>>
>> In investigating some performance issues with Mondrian
>> running against
>> MySQL, I've noticed that GROUP BY performance appears to be directly
>> related to the number of columns in the GROUP BY (for MySQL,
>> anyway).
>> Thus, the smaller the set of columns on which we group, the
>> better the
>> performance.
>>
>> The obvious optimization case, as Julian has previously pointed out,
>> is that if any of the levels being queried are based on the primary
>> key, the GROUP BY can be eliminated entirely.
>>
>> There are other cases that can be optimized as well.  However, these
>> require information which Mondrian cannot infer absent additional
>> schema information.
>>
>> The first is the case where some or all of a level's properties are
>> functionally dependent on the level itself.  In other words, for a
>> given value of the level's column, the values associated with the
>> properties are known to be invariant.  In such a case,
>> grouping on the
>> property column is logically redundant and may safely be
>> eliminated if
>> the database permits queries to SELECT columns not in the GROUP BY.
>> MySQL allows this by default, though support for it can be
>> turned off
>> if desired.
>>
>> The second case is the one in which all properties are functionally
>> dependent on their levels, and there is some set of level columns
>> which, taken together, uniquely identify every row in the
>> table (i.e.
>> they effectively comprise an alternate unique key). In such
>> cases, we
>> can eliminate the GROUP BY entirely if all of the levels in
>> the "key"
>> are present.
>>
>> I'd like to propose two new optional attributes to the schema
>> definition to support this:
>>
>> 1. New Property attribute: dependsOnLevelValue
>>
>> If set, indicates that the property is functionally dependent on the
>> level with which it is associated.  This identifies it as a
>> candidate
>> for removal from the GROUP BY on databases which allow it.
>>
>> 2. New Hierarchy attribute: uniqueKeyLevelName
>>
>> If set, indicates that all properties in the hierarchy are
>> functionally dependent on their levels, and that the set of levels
>> from the root to the named level (inclusive) defines an alternate
>> unique key.  This allows the GROUP BY to be eliminated entirely for
>> queries at this depth and deeper.  If uniqueKeyLevelName is set,
>> there's no need to also set dependsOnLevelValue for the individual
>> properties.
>>
>> Any objections to adding these as optional attributes in the schema
>> definition?  We've had this running in production for a while
>> (albeit
>> on a Mondrian 2.4 base), and are seeing good results.
>>
>> Figuring out a unit test for these may take some head-scratching.
>> Neither Foodmart nor SteelWheels appear to be good fits.
>>
>>  -- Eric
>>
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>>
>>
>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>





More information about the Mondrian mailing list