[Mondrian] Non empty optimizer performance regression

Richard Emberson remberson at edgedynamics.com
Fri May 4 09:30:54 EDT 2007


Look at the comment above the CrossJoinFunDef.nonEmptyList method
as a starter.
Assuming one wants to optimize the crossjoin for "NON EMPTY" by
excluding Members that are null, one must look at the corner
cases (which do not showup in Foodmart) prior to deciding to
drop a Member from the crossjoin.
Consider an MDX query that depend upon a calculated measure that returns
a non-null value only if the measures it depends upon are all null -
pathological, but possible. Or (a case that appeared in our local
application code) there is a calculated measure that returns null
if the Customer dimension in a query is at the ALL level and possibly
non-null (depending upon the dataset) when the Customer dimension was
at a non-ALL level - this calculated measure directly breaks the
assumption that if there is no data at the top level for a
given dimension, then there is no data for any lower level in that
dimension - again, pathological but possibly.
In a sense, the current "optimization" code attempts to account for
the corner cases ... making it rather slower.
So, does one want slow and correct, or fast and possibly wrong?

I'd love for someone to figure out how to make it fast and correct.

Using the CSV testcase loading capabilities in Mondrian, one can
create new tables, dimensions and cubes that allow one to create
and test pathological cases not found in Foodmart. See the
test code that use *.csv files.


Richard



Pappyn Bart wrote:
> Hi,
>  
> Since the new non empty optimizer code, a few queries run very slow in 
> my application (for 1 sec (before mondrian 2.3.2) -> 5 minutes).
>  
> I tried to simulate the problem with the foodmart cube, but I am not 
> successful (the foodmart schema is too simple, I guess).
>  
> Even a simple crossjoin with two all-level sets takes a very long time, 
> so the size of dimensions does not matter. I have a very large number of 
> calculated members and many calculated members need calculations based 
> on many other calculated members (deeply nested).  The same query runs 
> very fast (< 1 sec) without NON empty.
>  
> Mondrian seems to get stuck forever in the first 
> CrossJoinFunDef.nonEmptyListNEW(), it only leaves after a very long time. 
>  
>  
> --> Can someone please explain to me or hint me why this function could 
> be slow ?  Is it due to the fact that I have large number of calculated 
> members?  I looked at the code, ran it through the debugger, but I don't 
> understand what the function is exactly doing and why this takes such a 
> long time. 
>  
> If I could understand the pitfalls of this function a bit better, I 
> might be able to pinpoint (and solve) the problem.
>  
>  
> Where is the query that takes a long time, translated to foodmart data 
> (just an example, since it runs fast on the foodmart schema) :
>  
> WITH MEMBER [Measures].[Availability] as 'iif([Measures].[Warehouse 
> Sales] > 0, [Measures].[Warehouse Profit] / [Measures].[Warehouse 
> Sales], null)'
> MEMBER [Measures].[Performance] as 'iif([Measures].[Store Sales] > 0, 
> [Measures].[Profit] / [Measures].[Store Sales], null)'
> MEMBER [Measures].[Quality] as 'iif([Measures].[Units Ordered] > 0, 
> [Measures].[Units Shipped] / [Measures].[Units Ordered], null)'
>  
> MEMBER [Measures].[Overall eff] as 
> 'iif(IsEmpty([Measures].[Availability]) or 
> IsEmpty([Measures].[Performance]) or IsEmpty([Measures].[Quality]), 
> null, cast([Measures].[Availability] as NUMERIC) * 
> cast([Measures].[Performance] as NUMERIC) * cast([Measures].[Quality] as 
> NUMERIC))'
>  
> MEMBER [Measures].[Availability Loss tmp] as 
> 'iif([Measures].[Availability] > 0, 1.0 - [Measures].[Availability], 0)'
> MEMBER [Measures].[Performance Loss tmp] as 
> 'iif([Measures].[Performance] > 0, 1.0 - [Measures].[Performance], 0)'
> MEMBER [Measures].[Quality Loss tmp] as 'iif([Measures].[Quality] > 0, 
> 1.0 - [Measures].[Quality], 0)'
> MEMBER [Measures].[Sum loss tmp] as '[Measures].[Availability loss tmp] 
> + [Measures].[Performance loss tmp] + [Measures].[Quality loss tmp]'
> MEMBER [Measures].[Overall eff loss tmp] as 'iif([Measures].[Overall 
> eff] < 1.0, 1.0 - [Measures].[Overall eff], 0.0)'
>  
> MEMBER [Measures].[Availability loss] as 
> 'iif(IsEmpty([Measures].[Overall eff]), null, iif([Measures].[Sum loss 
> tmp] > 0, [Measures].[Overall eff loss tmp] * ([Measures].[Availability 
> loss tmp] / [Measures].[Sum loss tmp]), 0))'
>  
> SELECT NON EMPTY {[Measures].[Availability loss]} ON COLUMNS,
> NON EMPTY CrossJoin( {[Promotion Media].[All Media]}, {[Product].[All 
> Products]} ) ON ROWS
> FROM [Warehouse and Sales]
>  
> Thanks,
> Bart
>  
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian


-- 
Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.



More information about the Mondrian mailing list