[Mondrian] Deadlock issue
mcampbell at pentaho.com
Mon Nov 11 08:30:59 EST 2013
Luc and I had talked about potentially timing out cardinality queries,
as well. Ugly, but it may be the least bad alternative.
This issue is actually prompted by real user experience with reasonably
sized pools (25). I have a single query that can bring about deadlock
reliably with a QueryLimit of 5. In testing this, though, I have been
running more tests with a limit of 1 just to see what else looks
problematic. This flushed out a bug in the Semphore code. I've been
able break the assertion that (count > 0) when acquiring a permit. I
don't a full grasp of the scenario in which this happens, but I think
part of the problem is that when notify() gets called, the thread is
woken up, but it is still has to contend for the lock. So it's possible
for another thread to slip in and grab the permit out from under it.
By changing the "if (count==0)" in the wait block to a "while(count==0)"
this problem appears to go away. But, is there any reason not to switch
to java.concurrent.Semphore? I'm assuming we have a home-rolled
Semaphore because it was created pre-JDK1.5.
On 11/09/2013 06:40 PM, Julian Hyde wrote:
> It isn't critical that optimizePredicates does a perfect job. So maybe the cardinality query could time out and come back with a default cardinality, or skip predicate optimization altogether.
> It's just an idea, and not very elegant -- I agree that ideally the actor thread shouldn't be executing SQL at all. But the alternative seems to be to compute cardinalities up-front, and as ever, non-laziness means potentially wasted work.
> I infer that you are (a) stressing the system with a connection-pool of 1, and (b) asserting that SQL is not run from an agent thread. Great ideas. Worth running the whole suite in that mode, if you're not already.
> Mondrian mailing list
> Mondrian at pentaho.org
More information about the Mondrian