[Mondrian] A couple of "best practices" questions

Fri Jun 19 14:29:34 EDT 2009

> Eric wrote:
> 
> 1) Forcing table indexes
> 
> For performance reasons, the code I'm porting includes a new 
> attribute  
> on the Table schema element: forcedIndex.  If set, it forces 
> SqlQuery/ 
> SqlTupleReader to use the named index when reading from the table.
> 
> Is this an enhancement that sounds appropriate to add in to the  
> Mondrian mainline, or is there a better way to go about this using  
> existing Mondrian capabilities?  If the former I'm happy to 
> write some  
> tests abd contribute it; if the latter I'd like to make sure we're  
> doing it the right way instead.

I know you're working on MySQL. I don't mind letting this in, as long as it
does something reasonable on other databases. I'm guessing that you are
supplying hints by generating formatted comments into the SQL? How about
allowing the schema designer user to attach hints (arbitrary pieces of text,
probably looking like "/*+ use_index(foo) */") to their schema definition?
Is MySQL picky about where those hints are placed in the query?

Anyone else have requirements for generating hints/comments into SQL
statements? Some people have in the past asked for 'query tagging',
generating the user session id into the SQL.

> 2) Sparsely populated fact tables and sorting
> 
> One property of the datasets we're working with is that some of the  
> dimension tables can be large (up to a million rows or so), but the  
> fact tables tend to be much smaller and relatively sparse, meaning  
> that in many cases there are no rows in the fact table that 
> associate  
> with a given dimension.
> 
> That in and of itself is fine, but in terms of business behavior we  
> know that if there is no associated row in the fact table, it's  
> equivalent to having a row full of zeros.  We'd like to see that  
> knowledge carry through to sorting results, etcetera (i.e., if the  
> value is nonexistent, sort it as if it were zero instead).
> 
> This is currently implemented by overriding the sort functions to  
> treat null/empty as zero, which frankly makes all involved a little  
> itchy.  Is there a better approach to this problem?

To clarify: you're not changing the values, just changing how they sort?
This sounds similar to standard SQL's "ORDER BY x ASC NULLS COLLATE LAST",
so why not extend the syntax of the Order function, e.g.

Order(
  [Customer].Members,
  [Measures].[Unit Sales] ASC NULLS COLLATE AS ZERO EMPTY COLLATES LAST)

Julian