[Mondrian] per query memory limit?

Matt Campbell mcampbell at pentaho.com
Fri Jul 12 08:33:16 EDT 2013


Jeff-
On a related topic-- If you haven't already, it could be worth taking a look at CellBatchSize.  That's a property that came in a later version than you're currently running, but it would be interesting to test out whether it could help you.  The property limits the number of cell requests that are included in a single batch.  Cell requests themselves can consume a lot of memory, so setting a hard limit can help avoid out-of-control spikes.  There are some trade-offs with the property (sql efficiency vs. Mondrian resource efficiency), but it may offer a way of avoid memory thrashing when one or more queries have an enormous number of cell requests.

There was a recent thread on CellBatchSize where both Julian and Luc gave some good background on its rationale.

-m

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Wright, Jeff (Truven Health)
Sent: Friday, July 12, 2013 8:17 AM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] per query memory limit?

Added http://jira.pentaho.com/browse/MONDRIAN-1661.

No, I wasn't thinking of a separate thread. I used the word "check" rather than "monitor" in the JIRA.

We're using the result.limit, but this is only effective at blocking huge size on a single axis. It's not effective at catching medium size on 2 axes, or cases where MDX functions bring in extra cells. A common function we have memory issues with is Except([some level].members, ...). Maybe I should say, because we have a business requirement to support output with lots of rows, the result.limit is not effective in restricting Mondrian memory usage. We have result.limit set to 1 million.

We have attempted to use mondrian.util.memoryMonitor.enable=true, but this has the bad side effect of cancelling all queries when one is badly behaved. We also don't see all the memory cleaned up when this threshold kicks in and cancels queries.

--jeff

From: mondrian-bounces at pentaho.org<mailto:mondrian-bounces at pentaho.org> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Julian Hyde
Sent: Thursday, July 11, 2013 5:25 PM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] per query memory limit?

Some comments on how this might be achieved.

It's hard to figure out how many bytes a java data structure is using, so you're right to use a data structure that is a proxy for memory usage. Cell count (for aggregations) and member count (for dimension cache) are probably the right ones to use. Although members take quite a lot more memory than cells, especially dense cells.

If by "monitor", you mean create another thread that periodically checks the state of the query, then I disagree. The usage could go up very quickly, and by the time the query has been killed, the damage has been done: other users' data has been thrown out of the cache.

I'd go for a variant of #2. Keep a tally of the number of cells & members used thus far in executing the query, and abort if it crosses a threshold.

Note that we have something similar to this already, namely mondrian.result.limit. This feature would use a new property, but use a similar mechanism, and would also throw a ResourceLimitExceededException.

Can you please log a jira case for this so that we can track it.

Julian


On Jul 11, 2013, at 1:49 PM, "Wright, Jeff (Truven Health)" <jeff.s.wright at truvenhealth.com<mailto:jeff.s.wright at truvenhealth.com>> wrote:

Has anybody else every thought about implementing some kind of per query memory limit?

In our application, users can create ad hoc queries against a large schema with many medium-to-high cardinality dimensions. For that to work, it's important to be able to stop a query from taking over the Mondrian instance and using all memory. We've tried the memory threshold property and that's not a good solution.

I think in general there are two possible approaches:

1) Try to estimate the memory that will be required ahead of time, abort if too high.
2) Monitor some data structure that is a proxy for memory as you evaluate the query, and abort it out when you cross the threshold.

We've done some work with cell limits that is sort of like #1. But we find that a naive cell estimate is likely to miss some intermediate memory usage.

Any thoughts?

--Jeff Wright

_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org<mailto:Mondrian at pentaho.org>
http://lists.pentaho.org/mailman/listinfo/mondrian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20130712/e1f6ab1f/attachment.html 


More information about the Mondrian mailing list