[Mondrian] A couple of "best practices" questions

Fri Jun 19 16:56:30 EDT 2009

On Jun 19, 2009, at 12:29 PM, Julian Hyde wrote:

>
>
>> Eric wrote:
>>
>> 1) Forcing table indexes
>>
>> For performance reasons, the code I'm porting includes a new
>> attribute
>> on the Table schema element: forcedIndex.  If set, it forces
>> SqlQuery/
>> SqlTupleReader to use the named index when reading from the table.
>>
>> Is this an enhancement that sounds appropriate to add in to the
>> Mondrian mainline, or is there a better way to go about this using
>> existing Mondrian capabilities?  If the former I'm happy to
>> write some
>> tests abd contribute it; if the latter I'd like to make sure we're
>> doing it the right way instead.
>
> I know you're working on MySQL. I don't mind letting this in, as  
> long as it
> does something reasonable on other databases. I'm guessing that you  
> are
> supplying hints by generating formatted comments into the SQL? How  
> about
> allowing the schema designer user to attach hints (arbitrary pieces  
> of text,
> probably looking like "/*+ use_index(foo) */") to their schema  
> definition?
> Is MySQL picky about where those hints are placed in the query?

In MySQL, the precise placement is important -- it's not just  
arbitrary text, it's part of the SQL syntax.  It must appear after the  
table name (and alias, if any), i.e.:

   tbl_name [[AS] alias] [index_hint_list]

For Mondrian, assuming the following schema fragment:

    <Dimension foreignKey="my_dim_id" name="foo">
	<Hierarchy hasAll="true" allMemberName="All foo"  
primaryKey="my_dim_id" >
                 <Table name="my_dim" alias="my_dim"
                        forcedIndex="my_index">
                 </Table>
		...
	</Hierarchy>
    </Dimension>

The resulting SQL will be of the form:

   SELECT ... FROM my_dim FORCE INDEX (my_index) WHERE ...

 From a quick look at Oracle's doc, it looks like the approach there  
is more similar to what you're describing:

   SELECT /*+ index(my_dim, my_index) + */ ... FROM ... WHERE ...

SQL Server seems to use something closer to MySQL's approach:

   SELECT ... FROM my_dim WITH INDEX (my_index) WHERE...

I may not have the syntax exactly right for Oracle/SQLserver, since I  
don't have either sitting in front of me -- I pulled these from the  
respective doc sets.

My sense is that the index hint probably is useful at least on some  
databases, though obviously the above 3 don't qualify as an exhaustive  
list.  Given the way MySQL and SQL Server seem to work, I'm not sure  
tagging on arbitrary text would work.  At the very least, we'd need  
some kind of syntax indicating where those tags need to go, at which  
point we may just be better off declaring an attribute and handling it  
in the dialects.

>
> Anyone else have requirements for generating hints/comments into SQL
> statements? Some people have in the past asked for 'query tagging',
> generating the user session id into the SQL.
>
>> 2) Sparsely populated fact tables and sorting
>>
>> One property of the datasets we're working with is that some of the
>> dimension tables can be large (up to a million rows or so), but the
>> fact tables tend to be much smaller and relatively sparse, meaning
>> that in many cases there are no rows in the fact table that
>> associate
>> with a given dimension.
>>
>> That in and of itself is fine, but in terms of business behavior we
>> know that if there is no associated row in the fact table, it's
>> equivalent to having a row full of zeros.  We'd like to see that
>> knowledge carry through to sorting results, etcetera (i.e., if the
>> value is nonexistent, sort it as if it were zero instead).
>>
>> This is currently implemented by overriding the sort functions to
>> treat null/empty as zero, which frankly makes all involved a little
>> itchy.  Is there a better approach to this problem?
>
> To clarify: you're not changing the values, just changing how they  
> sort?
> This sounds similar to standard SQL's "ORDER BY x ASC NULLS COLLATE  
> LAST",
> so why not extend the syntax of the Order function, e.g.
>
> Order(
>  [Customer].Members,
>  [Measures].[Unit Sales] ASC NULLS COLLATE AS ZERO EMPTY COLLATES  
> LAST)

I can't explain why the original authors did it this way, but as for  
why I didn't think of it?  It was probably just too flippin' obvious. :)

I'll take a look and see if there's any reason that approach won't work.

  -- Eric