[Mondrian] queryBuilder merge

Matt Campbell mcampbell at pentaho.com
Tue Nov 5 09:38:56 EST 2013

Regarding omitted dimension keys, it seems incorrect to require the key for degenerate dimensions.  I also notice the example of degenerate dimensions in Mondrian in Action leaves off the key, so it may be good to leave it optional.

I was considering adding a check in RolapSchemaLoader to verify that in cases where the key is missing that the attribute list is size 1, since that should only happen with a degenerate dim.  We could then automatically assign the key to be that one attribute.  Does that seem reasonable?

On 11/04/2013 12:15 PM, Matt Campbell wrote:
Thanks for the feedback.  I've run into one more issue.  When applying a context constraint (SqlConstraintUtils.addContextConstraint) we construct a cell request based on the non all members in the evaluator, and we use that cell request when getting the list of columns to include in the constraint.  Problem is, if the evaluator contains [Store Sales] in a query that spans both the Sales measure group and Warehouse measure group, the context constraint we construct can bring in the wrong fact table, since we're now using the PhysPath to find the parent table.  So we now have cases where UNION'd tuple queries will include both sales_fact_1997 and inventory_fact_1997 in a single FROM.  This happens with VirtualCubeTest.testNonEmptyCJConstraintOnVirtualCube, for example.

I'm thinking that logic needs to now look at the fact table that's specified in the SqlQueryBuilder and use that when constructing the list of constrained columns.

On 11/01/2013 07:41 PM, Julian Hyde wrote:
On Nov 1, 2013, at 1:22 PM, Matt Campbell <mcampbell at pentaho.com<mailto:mcampbell at pentaho.com>> wrote:

Hi Julian,
I've been working on merging in the queryBuilder branch and could use some guidance on some test failures I'm seeing.

1)  There are two failures involving missing dimension keys.  From the comment in SchemaTest.testKeyAttributeMissing(), it sounds like dimensions with omitted keys are permitted, so long as they don't link to a measure group.  While resolving names during schema load, the code attempts to find a path to all dimensions.  This throws an exception in cases where there is no path.  Do you have any suggestions for handling this case?  It's also not clear to me why we allow the key to be left off.

I don't recall why I allowed a dimension's key attribute to be omitted.

I don't think there would be a problem if we made it required.

2)  Closure tables can cause an infinite loop.  For example, in ParentChildHierarchyTest.testClosureTableInVirtualCube().  Stepping through this logic, it looks like we're trying to find the parent table of salary.  We find the path from employee_closure to salary, but end up looping back around to find the parent table of salary again.

That sounds plausible. In that query are two usages of salary. They are different usages because they join to the fact table on different primary key/foreign key combinations. The logic that finds "tables" should recognize that.

Suggest that you disable the test and log a jira case.

The problem might also occur if there are two uses of a dimension table by regular dimensions via different foreign keys.

I think the other test differences I'm seeing are not real errors.  There is a difference in the members of the product dimension, for example, due to the fact that queries which load members of snowflake dimensions now join the tables together. There are some product_category values that drop out when you join product to product_class (like [Dry Goods]).

At some point I made that "snowflake referential integrity" behavior a preference. I slightly favor the old behavior (no join, which means more efficient queries, even though members may later disappear when you join in higher levels) but I can live with the change.

Also, there are many differences in SQL results due to different ordering of tables in the FROM.

That's expected. Now might be a good time to (a) obsolete reference queries from databases other than MySQL (unless they are testing database-specific features), (b) regenerate reference queries with line-feeds (long lines are difficult to diff).


PS I took the liberty of replying to the public list.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20131105/101d055e/attachment.html 

More information about the Mondrian mailing list