[Mondrian] per query memory limit?

Wright, Jeff (Truven Health) jeff.s.wright at truvenhealth.com
Wed Aug 7 13:26:50 EDT 2013


Revisiting this thread... We have spent some time on code changes and testing related to counting cell and member requests at the point of loading from the DBMS. I'm trying to figure out how to think about how this interacts with caching. Looking for some reactions...

The cell requests seem to be only for cells that are not already cached (I expect members are same). My thought experiment on this is that potentially a clever/lucky user could work around a cell request limit by carefully working up to the full query. For example, if my query is comparing 2012 data to 2011, and gets failed for too many cell requests, I could run a query first for 2011 and generate fewer cell requests. But that doesn't really reduce the memory overhead of my combo query... or does it?

My thinking so far is that this would actually be a combined limit, something like

If ( ( 2 * memberRequests + cellRequests ) > requestThreshold )
                throw new RequestLimitException();

But maybe we need to take into account the size of member and cell caches? I haven't looked to see how accessible that is.

--jeff

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Wright, Jeff (Truven Health)
Sent: Friday, July 12, 2013 8:17 AM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] per query memory limit?

Added http://jira.pentaho.com/browse/MONDRIAN-1661.

No, I wasn't thinking of a separate thread. I used the word "check" rather than "monitor" in the JIRA.

We're using the result.limit, but this is only effective at blocking huge size on a single axis. It's not effective at catching medium size on 2 axes, or cases where MDX functions bring in extra cells. A common function we have memory issues with is Except([some level].members, ...). Maybe I should say, because we have a business requirement to support output with lots of rows, the result.limit is not effective in restricting Mondrian memory usage. We have result.limit set to 1 million.

We have attempted to use mondrian.util.memoryMonitor.enable=true, but this has the bad side effect of cancelling all queries when one is badly behaved. We also don't see all the memory cleaned up when this threshold kicks in and cancels queries.

--jeff

From: mondrian-bounces at pentaho.org<mailto:mondrian-bounces at pentaho.org> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Julian Hyde
Sent: Thursday, July 11, 2013 5:25 PM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] per query memory limit?

Some comments on how this might be achieved.

It's hard to figure out how many bytes a java data structure is using, so you're right to use a data structure that is a proxy for memory usage. Cell count (for aggregations) and member count (for dimension cache) are probably the right ones to use. Although members take quite a lot more memory than cells, especially dense cells.

If by "monitor", you mean create another thread that periodically checks the state of the query, then I disagree. The usage could go up very quickly, and by the time the query has been killed, the damage has been done: other users' data has been thrown out of the cache.

I'd go for a variant of #2. Keep a tally of the number of cells & members used thus far in executing the query, and abort if it crosses a threshold.

Note that we have something similar to this already, namely mondrian.result.limit. This feature would use a new property, but use a similar mechanism, and would also throw a ResourceLimitExceededException.

Can you please log a jira case for this so that we can track it.

Julian


On Jul 11, 2013, at 1:49 PM, "Wright, Jeff (Truven Health)" <jeff.s.wright at truvenhealth.com<mailto:jeff.s.wright at truvenhealth.com>> wrote:

Has anybody else every thought about implementing some kind of per query memory limit?

In our application, users can create ad hoc queries against a large schema with many medium-to-high cardinality dimensions. For that to work, it's important to be able to stop a query from taking over the Mondrian instance and using all memory. We've tried the memory threshold property and that's not a good solution.

I think in general there are two possible approaches:

1) Try to estimate the memory that will be required ahead of time, abort if too high.
2) Monitor some data structure that is a proxy for memory as you evaluate the query, and abort it out when you cross the threshold.

We've done some work with cell limits that is sort of like #1. But we find that a naive cell estimate is likely to miss some intermediate memory usage.

Any thoughts?

--Jeff Wright

_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org<mailto:Mondrian at pentaho.org>
http://lists.pentaho.org/mailman/listinfo/mondrian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20130807/aafd3b92/attachment.html 


More information about the Mondrian mailing list