[Mondrian] Aggregate UDFs

Matt Campbell mcampbell at pentaho.com
Thu Apr 10 17:16:23 EDT 2014


If the set sizes resulting from the Filter() fell below maxConstraints, Mondrian would support this by firing separate SQL queries for each bucket.  Even if the set size is greater than maxConstraints, though, Mondrian is still able to compute the distinct count value in some cases--it will try to optimize the set, replacing all children of a parent with the parent.

Is [Patients Rx] a count of [Person ID]s with Drug coverage?  Could you construct a calculation that does a Count() of the members of that filtered set instead?

I'm betting that the [Person ID] can be very large.  I'd be wary of the performance impact of using Aggregate() on 10s or 100s of thousands of tuples, even in the case of additive measures.


________________________________________
From: mondrian-bounces at pentaho.org [mondrian-bounces at pentaho.org] On Behalf Of Wright, Jeff [jeff.s.wright at truvenhealth.com]
Sent: Thursday, April 10, 2014 4:50 PM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] Aggregate UDFs

I've seen a post where Julian refers to this kind of MDX for histograms as an MDX-as-SQL anti-pattern, but I haven't been able to think of another way to do it.

Speaking of SQL... You can do this in SQL by selecting from a select. In the inner select, the unit of aggregation are the entities you want to collect into the histogram ranges ([Person ID]), and the outer query uses that to aggregate to the ranges.

It feels like the problem begs for two passes over the data like that, but I haven't seen a clear way to express it in MDX that Mondrian can execute.

--jeff

-----Original Message-----
From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Johnson, Jeremiah
Sent: Thursday, April 10, 2014 4:21 PM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] Aggregate UDFs

The error is "Aggregation is not supported over a list with more than 1000 predicates". This occurs as a result of a distinct count on the bucket. Mondrian (we are on 3.6). Something like this causes an error:

with member [Person ID].[Less Than 4 Scripts] as
    'Aggregate(
        FILTER( [Person ID].[Person ID].members,
        [Measures].[Scripts Rx] < 4 ) )'
member [Person ID].[4 or More Scripts] as
    'Aggregate(
        FILTER( [Person ID].[Person ID].members,
        [Measures].[Scripts Rx] > 3 ) )'
select
non empty {[Measures].[Patients Rx], [Measures].[Net Pay Rx], [Measures].[Scripts Rx]} on columns, non empty {[Person ID].[Less Than 4 Scripts], [Person ID].[4 or More Scripts]} ON ROWS from [Drug Claim]

Patient RX is a distinct count. This works fine without Patient RX.


-----Original Message-----
From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Matt Campbell
Sent: Thursday, April 10, 2014 4:06 PM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] Aggregate UDFs


JJ-
Can you say a little more about the invalid SQL you're seeing?  That worries me.  Was it failing with too large of an in list, or wrong in some other way?

Can you say a little more about your use case, as well?

Thanks,
matt

________________________________________
From: mondrian-bounces at pentaho.org [mondrian-bounces at pentaho.org] On Behalf Of Johnson, Jeremiah [jeremiah.johnson at truvenhealth.com]
Sent: Tuesday, April 08, 2014 4:44 PM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] Aggregate UDFs

Thanks. That was what I was looking for. We were trying to use Filter() to solve our problem but it generates invalid SQL. Mondrian was creating an in clause with too many parameters. We were trying to fix that by writing our own UDF. It does not look like the way to solve our problem because we would end up with a very similar calc object as Filter() creates.

If there was an equivalent to "having" that would solve our problem. We are trying to create a histogram. If we create calculated members for each bucket using a having to narrow down who belongs in that bucket we would have what we need. We can't simply sum because the some selects which bucket that data point belongs too. That data point could end up in different buckets depending on how it is sliced.

Is there a having equivalent? I read a discussion about this from some time ago but it doesn't appear to have materialized into a having type functionality.

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Luc Boudreau
Sent: Monday, April 07, 2014 3:48 PM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] Aggregate UDFs

Hello Jeremiah,

When you implement a UDF in Mondrian, one of the methods you can implement specifies the type of object returned by the instance. The Javadocs have all the details here:

http://mondrian.pentaho.com/api/mondrian/spi/UserDefinedFunction.html#getReturnType(mondrian.olap.type.Type[])

It's worth noting that this method takes a list of parameter types. These are the types that were used when the method was invoked in MDX, so you can return a different type of object according to the arguments which were used in the MDX.

Alternatively, for quick prototyping, you can implement the UDFs in Javascript directly in your schema file. It's the fast and easy way to try them out, but I strongly advise writing them in Java and writing unit tests and all. We have a lot of unit tests for UDFs in the base test suite, so you can copy/paste some code from there.

Let us know if we can help you further.

Luc

On Mon, Apr 7, 2014 at 2:32 PM, Johnson, Jeremiah <jeremiah.johnson at truvenhealth.com<mailto:jeremiah.johnson at truvenhealth.com>> wrote:
I would like to write a UDF that takes in a set and  transforms it into a new set. I think I have figured out how to process sets in a UDF but I'm not sure if Mondrian can handle returning a set from a UDF because the execute() method returns an Object. I don't know what classes are legal. I did look at existing UDFs and nothing is jumping out at me as returning a set. Can I return a set and what class type does it have to be?


.....................................................
Jeremiah Johnson
Truven Health Analytics
Phone: 734-913-3194<tel:734-913-3194>
The Healthcare Business of Thomson Reuters is now Truven Health Analytics.


_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org<mailto:Mondrian at pentaho.org>
http://lists.pentaho.org/mailman/listinfo/mondrian

_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian
_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian
_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian


More information about the Mondrian mailing list