[Mondrian] Mondrian Query Timeout

Joe Barnett thejoe at gmail.com
Mon Aug 8 18:27:25 EDT 2011


ugh, attached the wrong patch, so just filed a JIRA with the right one:

http://jira.pentaho.com/browse/MONDRIAN-976

-Joe

On Mon, Aug 8, 2011 at 2:04 PM, Joe Barnett <thejoe at gmail.com> wrote:
> Ok, I've had a little bit of time to play with a trunk snapshot (up to
> perforce changelist 14501 right now).  Seems like a configurable
> ThreadFactory is, in fact, what we need, so I've attached a quick
> patch to allow this.  Let me know if you'd rather it be in JIRA, or if
> we need more discussion first (here or in JIRA) instead.
>
> A couple of questions I've come across not directly related to the
> shepherd thread:
>
> 1) We're still using mondrian.olap.* rather than the olap4j API, and
> are starting to get some deprecation warnings with 3.3 code.  Fine for
> now, but looking into moving over to olap4j instead, I don't see an
> obvious replacement for m.o.DriverManager#getConnection(PropertyList,
> CatalogLocator, DataSource).  We basically want to get an
> OlapConnection and pass it an existing, configured JDBC DataSource to
> use for db access.
>
> 2) What's the best way to get the QueryTiming information after a
> result has been returned?  Right now, I'm casting the
> mondrian.olap.Result into a RolapResult and calling
> RolapResult#getExecution().getQueryTiming(), but seems like there
> might be a more elegant way of accessing it without needing a cast?
>
> Thanks,
> -Joe
>
> On Fri, Jul 22, 2011 at 12:10 PM, Joe Barnett <thejoe at gmail.com> wrote:
>> On Thu, Jul 14, 2011 at 7:56 AM, Luc Boudreau <lucboudreau at gmail.com> wrote:
>>>
>>>> Do you have a sense of the differences in response time between
>>>> cancel()ing the request without this extra thread vs/ with it?  Seems
>>>> like if the code is able to poll often enough
>>>
>>> Our experience has shown that there are cases where the cancellation cannot
>>> occur as expected. When Mondrian is issuing a long running SQL query, there
>>> was no way to make the MDX request timeout correctly. As we want to avoid
>>> doing a hard kill of threads (we'd rather have the JDBC resources freed
>>> properly in the background so as not to hose the RDBMS and whatever other
>>> symptoms), we needed to split the user thread from the execution thread.
>>
>> ok, I guess that JDBC's Statement#cancel() is not sufficiently
>> implemented in enough drivers/dbs to make this threading worthwhile.
>>
>>> Julian mentioned Execution.peek(). He should have mentioned Locus.peek()
>>> instead. Using this static accessor, your UDF will be capable of accessing
>>> the Execution instance of your particular thread by doing
>>> "Locus.peek().execution". Let's work together to figure out which
>>> information the Execution state should contain.
>>>
>>
>> I'll take a look in the next few weeks; I need to do some testing of
>> our app vs/ the latest 3.3 code anyway, to check for performance
>> regressions, etc.  Since it's not actually only within our UDF, but
>> also our jdbc datasource implementation that needs to read from these
>> ThreadLocals, I'm not sure the Locus.peek().execution will work (our
>> datasource impl has no knowledge of mondrian code as it's used
>> throughout our codebase, and should stay that way).  However, I'd
>> guess that as long as we can configure our own ThreadFactory for the
>> execution thread creation, we can set up the needed context there.
>> Again, will have more detailed/specific feedback in the next few weeks
>> once I carve out some time to do a quick 3.3 experimental port.  Also
>> possible I'll see a quick and easy way to use (or modify)
>> Locus.peek().execution to fit our needs; will keep the list updated.
>>
>> Thanks,
>> -Joe
>>
>>> Luc
>>>
>>> On Wed, Jul 13, 2011 at 6:35 PM, Julian Hyde <jhyde at pentaho.com> wrote:
>>>>
>>>> Take a look at Execution.peek(). I want to reduce the use of thread-locals
>>>> in mondrian because they make it more difficult to have a parallel
>>>> execution
>>>> model. As much state as possible should be stored in Execution. However, I
>>>> know I can't eradicate thread-locals overnight.
>>>>
>>>> Execution.peek() will allow you to retrieve the current Execution object.
>>>> Under the covers it uses -- you guessed it -- a thread-local. (So you
>>>> don't
>>>> have to.)
>>>>
>>>> Use Execution.peek() when your thread starts work, and store the Execution
>>>> object in its local state.
>>>>
>>>> Execution isn't a supported API yet, but we can evolve it into one once
>>>> we've proved the approach is viable.
>>>>
>>>> Julian
>>>>
>>>>
>>>> > -----Original Message-----
>>>> > From: mondrian-bounces at pentaho.org
>>>> > [mailto:mondrian-bounces at pentaho.org] On Behalf Of Joe Barnett
>>>> > Sent: Wednesday, July 13, 2011 3:25 PM
>>>> > To: Mondrian developer mailing list
>>>> > Subject: Re: [Mondrian] Mondrian Query Timeout
>>>> >
>>>> > One potential concern I have with this is that we have a UDF that
>>>> > essentially relies upon ThreadLocals that are set up by our
>>>> > application to control routing to the appropriate database.  Actually,
>>>> > the DataSources we pass to mondrian rely on those ThreadLocals as
>>>> > well, so not actually UDF specific.  It sounds like this approach
>>>> > would have mondrian run in a separate thread context, and hence not
>>>> > have the appropriate ThreadLocal values set.
>>>> >
>>>> > Are you planning on adding any hooks to the thread fork where we might
>>>> > be able to set up the thread appropriately?  We'd also probably have
>>>> > to adjust some things (connection pools, etc) to expect an additional
>>>> > thread per thread-that-calls-into-mondrian, but would need to study
>>>> > the exact execution patterns a little more closely to see if that's
>>>> > really necessary.
>>>> >
>>>> > Do you have a sense of the differences in response time between
>>>> > cancel()ing the request without this extra thread vs/ with it?  Seems
>>>> > like if the code is able to poll often enough (and
>>>> > interrupt()/cancel() any underlying IO), the difference should be
>>>> > negligible enough?  But not sure exactly where all the
>>>> > cancel-checking-hooks are.
>>>> >
>>>> > Other thoughts?
>>>> >
>>>> > -Joe
>>>> >
>>>> > On Tue, Jul 12, 2011 at 7:35 AM, Luc Boudreau
>>>> > <lucboudreau at gmail.com> wrote:
>>>> > >
>>>> > > Good. That's exactly what I had in mind.
>>>> > >
>>>> > > As for b), we are using backport-util-concurrent so I think
>>>> > we're good.I'll
>>>> > > make sure to test thoroughly.
>>>> > >
>>>> > > Luc
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Mon, Jul 11, 2011 at 6:21 PM, Julian Hyde
>>>> > <jhyde at pentaho.com> wrote:
>>>> > >>
>>>> > >> However we solve this, we need a data structure that
>>>> > records all SQL
>>>> > >> statements that are running in the service of a particular
>>>> > MDX statement.
>>>> > >> The Execution class that I added on Friday (see
>>>> > >> http://p4web.eigenbase.org/@md=d&c=6PU@/14436?ac=10 ) is
>>>> > perfect for this.
>>>> > >>
>>>> > >> I couldn't tell whether you were suggesting this, but just
>>>> > in case, let me
>>>> > >> state for the record: I think that killing Java threads is
>>>> > a bad idea. It is
>>>> > >> very likely to make a mess. I think that Java code under
>>>> > mondrian's control
>>>> > >> should continue to poll the 'canceled' flag. (Which on
>>>> > Friday moved from
>>>> > >> Query to Execution.)
>>>> > >>
>>>> > >> For SQL statements, mondrian should definitely call
>>>> > cancel. If the JDBC
>>>> > >> driver of the database concerned is well-implemented, it
>>>> > will do something
>>>> > >> sensible with the cancel request. (Probably three things:
>>>> > cancel the
>>>> > >> synchronous call fairly quicky, throw a SQLException
>>>> > indicating that the
>>>> > >> statement is canceled, and send a message to the deeper
>>>> > parts of the
>>>> > >> infrastructure to cancel asynchronous processing, slower but more
>>>> > >> exhaustively.)
>>>> > >>
>>>> > >> Since Mondrian is itself an JDBC server (by virtue of
>>>> > olap4j being an
>>>> > >> extension to JDBC), we should do something similar. It's
>>>> > worth asking: what
>>>> > >> MondrianOlap4jStatement.executeOlapQuery() should do when
>>>> > canceled? I expect
>>>> > >> that Execute.cancel will be polled every ~millisecond, so
>>>> > cancel the
>>>> > >> statement should come back fairly quickly. We could make
>>>> > it come back
>>>> > >> quicker still if executeOlapQuery were making a call to
>>>> > another thread. Then
>>>> > >> we could interrupt that call without damaging either of
>>>> > the threads.
>>>> > >>
>>>> > >> Let me be clear that this is a "nice to have". And there are some
>>>> > >> implementation details I haven't solved:
>>>> > >>
>>>> > >> (a) I am speculating that there is an 'cancelable future'
>>>> > in JDK 1.6
>>>> > >> off-the-shelf. I mean: create a Future, have another
>>>> > thread (from an
>>>> > >> ExecutionService presumably) do the work, and allow any
>>>> > 3rd thread to cancel
>>>> > >> that call. Is this available in the JDK?
>>>> > >>
>>>> > >> (b) If the calls you need are not available in JDK 1.4 or
>>>> > JDK 1.5, it's OK
>>>> > >> if they have inferior behavior. But it still needs to
>>>> > build & run in those
>>>> > >> JDKs.
>>>> > >>
>>>> > >> (c) MondrianOlap4jStatement.executeOlapQuery is not
>>>> > necessarily the right
>>>> > >> place in the architecture to make the asynchronous
>>>> > 'slice'. Find the ideal
>>>> > >> place, maybe a little ways below it. But note that
>>>> > Locus.peek() etc. uses a
>>>> > >> thread-local.
>>>> > >>
>>>> > >> Julian
>>>> > >>
>>>> > >> ________________________________
>>>> > >> From: Luc Boudreau [mailto:lucboudreau at gmail.com]
>>>> > >> Sent: Monday, July 11, 2011 8:57 AM
>>>> > >> To: Julian Hyde
>>>> > >> Cc: Mondrian developer mailing list
>>>> > >> Subject: Mondrian Query Timeout
>>>> > >>
>>>> > >> Julian,
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >> Following your commit on the statements refactoring, I
>>>> > have started to
>>>> > >> think about how we would now enforce a proper query
>>>> > timeout. (MONDRIAN-415).
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >> What I had in mind is a simple system where, when creating
>>>> > RolapResult
>>>> > >> objects (in RolapConnection.execute()) we fork the
>>>> > execution to a second
>>>> > >> thread (let’s call it the execution thread). The first
>>>> > thread (let’s call it
>>>> > >> the user thread) will wait a predetermined amount of time
>>>> > (the actual
>>>> > >> timeout) and if the execution thread has not completed,
>>>> > the user thread will
>>>> > >> perform the following.
>>>> > >>
>>>> > >> Stop all SQL statements currently associated to this Execution.
>>>> > >> Do house cleaning.
>>>> > >> Bake a cake. (optional)
>>>> > >> Throw back a timeout exception.
>>>> > >>
>>>> > >> This has several advantages. First off, we could remove
>>>> > all checks for
>>>> > >> query timeout in the evaluator and centralize everything
>>>> > in a single place.
>>>> > >> Second, as it stands now, there is no way to do a “hard
>>>> > stop” of any SQL
>>>> > >> statements running for a given Execution as we sit in the
>>>> > same thread. As
>>>> > >> long as the SQL Statement has not returned, we have no way
>>>> > to check for a
>>>> > >> timeout. As of now, the best we can do is to set an
>>>> > arbitrary JDBC timeout,
>>>> > >> but since there is no telling how many SQL statements will
>>>> > be issued, nor
>>>> > >> their priority, we cannot determine the proper JDBC
>>>> > timeout value to apply.
>>>> > >> By having the user thread overview the execution process,
>>>> > it becomes much
>>>> > >> simpler to manage and cleanup all the resources used by a
>>>> > query execution.
>>>> > >>
>>>> > >> Should I try this approach and submit a first POC?
>>>> > >
>>>> > >
>>>> > > _______________________________________________
>>>> > > Mondrian mailing list
>>>> > > Mondrian at pentaho.org
>>>> > > http://lists.pentaho.org/mailman/listinfo/mondrian
>>>> > >
>>>> > >
>>>> > _______________________________________________
>>>> > Mondrian mailing list
>>>> > Mondrian at pentaho.org
>>>> > http://lists.pentaho.org/mailman/listinfo/mondrian
>>>> >
>>>>
>>>> _______________________________________________
>>>> Mondrian mailing list
>>>> Mondrian at pentaho.org
>>>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>>
>>>
>>> _______________________________________________
>>> Mondrian mailing list
>>> Mondrian at pentaho.org
>>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>>
>>>
>>
>


More information about the Mondrian mailing list