[Mondrian] SQLs generated by Segment.load do not include measurefilters

Julian Hyde jhyde at pentaho.com
Mon Feb 9 22:12:30 EST 2009


It's a reasonable thing to expect mondrian to do, and in principle mondrian
could do it. But I'm guessing that implementing the rules in mondrian might
be hard.

It is very closely related to the problems I described in a blog post last

A framework for modeling MDX queries internally in something similar to
relational algebra is needed. Within the framework, mondrian would apply
rules to transform query trees into equivalent trees, and use costing to
make decisions.

Without the framework we would likely hit three major problems:

1. Your optimization would need to be implemented in several places
throughout the code. It would be difficult to test, and would make the code
more complicated, so it is more difficult to add future optimizations.

2. One would expect the optimization to be applied in several similar
situations, but if we coded this optimization using 'if .. then' duck tape
it would probably only apply to situations very similar to our use cases.

3. The optimization may not always be the best thing to do. There may be
some queries where evaluation in MDX is the best cause, or where there is an
alternative SQL formulation. A costing model is needed in order to choose
between alternatives.

Putting in that framework is a long-term project of mine, that realistically
may take a year or two, if we preserve existing use cases and translate code
in native evaluation into the framework. 

In the mean time, the best thing we can do is to collect use cases. Can you
please log this as an RFE.

And I may be wrong: this may be straightforward to implement. It's
definitely worth giving it a try. I'd love to be proved wrong.


> -----Original Message-----
> From: mondrian-bounces at pentaho.org 
> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Robin Tharappel
> Sent: Monday, February 09, 2009 5:52 PM
> To: Mondrian developer mailing list
> Subject: [Mondrian] SQLs generated by Segment.load do not 
> include measurefilters
> Hello,
> I am coming across a performance issue with an MDX that has a measure
> filter. In my MDX the filter is evaluated natively and the
> SqlTupleReader generates a SQL that returns 10,000 rows. However the
> query for loading the Segment returns over 1 million rows. The main
> reason for the difference between the number of rows returned between
> SqlTupleReader and Segment is that the query generated for
> Segment.load does not have the measure filter. The following MDX using
> the FoodMart data set can illustrate the problem
> select {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
> Filter(NonEmptyCrossJoin([Product].[All Products].Children,
> [Store].[All Stores].[USA].Children),
> [Measures].[Unit Sales] > 20000) ON ROWS
> from [Sales]
> Below is the SQL generated by SqlTupleReader with the having clause
> for the measure filter (returns 4 rows)
> select
> "product_class"."product_family" as "c0",
> "store"."store_country" as "c1",
> "store"."store_state" as "c2"
> from
> "product" "product",
> "product_class" "product_class",
> "agg_c_14_sales_fact_1997" "agg_c_14_sales_fact_1997",
> "store" "store"
> where
> "product"."product_class_id" = "product_class"."product_class_id"
> and "agg_c_14_sales_fact_1997"."product_id" = "product"."product_id"
> and "agg_c_14_sales_fact_1997"."store_id" = "store"."store_id"
> and "agg_c_14_sales_fact_1997"."the_year" = 1997
> and ("store"."store_country" = 'USA')
> group by 
> "product_class"."product_family","store"."store_country","stor
> e"."store_state"
> having (sum("agg_c_14_sales_fact_1997"."unit_sales") > 20000.0)
> order by "product_class"."product_family" ASC,
> "store"."store_country" ASC,
> "store"."store_state" ASC
> Below is the SQL generated by Segment without the measure filter
> (returns 9 rows)
> select
> "store"."store_state" as "c0",
> "agg_c_14_sales_fact_1997"."the_year" as "c1",
> "product_class"."product_family" as "c2",
> sum("agg_c_14_sales_fact_1997"."unit_sales") as "m0",
> sum("agg_c_14_sales_fact_1997"."store_sales") as "m1"
> from
> "store" "store",
> "agg_c_14_sales_fact_1997" "agg_c_14_sales_fact_1997",
> "product_class" "product_class",
> "product" "product"
> where
> "agg_c_14_sales_fact_1997"."store_id" = "store"."store_id" and
> "store"."store_state" in ('CA', 'OR', 'WA') and
> "agg_c_14_sales_fact_1997"."the_year" = 1997 and
> "agg_c_14_sales_fact_1997"."product_id" = "product"."product_id"
>  and
> "product"."product_class_id" = "product_class"."product_class_id"
> group by
> "store"."store_state",
> "agg_c_14_sales_fact_1997"."the_year",
> "product_class"."product_family"
> Although the difference in this example is small, with a larger
> dataset a situation could occur where there is a lot of data loaded
> into the Segment which is not needed to resolve the query.  Would it
> be possible to have the measure filter applied to the Segment query
> (add the having clause) ?
> Thanks,
> Robin

More information about the Mondrian mailing list