[Mondrian] Possible race condition in 3.6.x SegmentCacheManager/Actor?

Joe Barnett thejoe at gmail.com
Tue Apr 29 16:08:08 EDT 2014


Hello All,

We've been migrating our mondrian implementation from 3.2.x to 3.6.x for a
while now, and are getting close to being ready to push to production.
Unfortunately, we're currently seeing what looks like some sort of race
condition in our test environment.  I haven't been able to isolate the case
that causes the problem, but after running for a while, we see threads
locked up when examining the JVM thread state.  We see our query thread
waiting for a RolapResultShepherd$executor thread[1] (no problem there),
but that executor thread is waiting[2] for the SegmentCacheManager$ACTOR to
get back to it, and the actor itself is waiting for work to do[3].  It
seems that the command is getting submitted to the Actor, but somehow
dropped, or perhaps processed by the responseMap is getting cleaned up
before the executor can .get() it (or there's some sort of hash/equals
collision on the command objects somehow)?  Possibly these requests are
happening concurrently with cache flushes / MondrianServer shutdowns, and
the queue is not fully processing?

Does anyone with more knowledge of how the SegmentCacheManager works have
any suggestions on things we should look into?  Our current implemenation
is based on git commit id 7498b4a4bcfb791e303de20407830aaa61753042 (3.6
branch from Fri Jan 24 05:55:31 2014 -0800 -- we had previously run into
MONDRIAN-1726 when we were based on git commit id
609d3f16ee772456f405ca21f9d2a6053ea5cb7f from Wed Dec 4 09:29:51 2013
-0500, pulling mkambol's change seems to have fixed that deadlock, but
revealed this issue).

Thanks,
-Joe


[1] "http-8080-8849951882670653653" daemon prio=10 tid=0x00007fe5c4334800
nid=0x11ca waiting on condition [0x00007fe51e1df000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006ab9877e8> (a
java.util.concurrent.FutureTask$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
        at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at
mondrian.rolap.RolapResultShepherd.shepherdExecution(RolapResultShepherd.java:146)
        at mondrian.rolap.RolapConnection.execute(RolapConnection.java:616)
        at mondrian.rolap.RolapConnection.execute(RolapConnection.java:600)
        ....

[2] "mondrian.rolap.RolapResultShepherd$executor_1" daemon prio=10
tid=0x00007fe54c020800 nid=0x5139 waiting on condition [0x00007fe524ecd000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000005156ef888> (a
java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
        at
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
        at mondrian.util.SlotFuture.get(SlotFuture.java:87)
        at mondrian.olap.Util.safeGet(Util.java:2096)
        at mondrian.util.BlockingHashMap.get(BlockingHashMap.java:63)
        at
mondrian.rolap.agg.SegmentCacheManager$Actor.execute(SegmentCacheManager.java:1025)
        at
mondrian.rolap.agg.SegmentCacheManager.execute(SegmentCacheManager.java:350)
        at
mondrian.rolap.FastBatchingCellReader.loadAggregations(FastBatchingCellReader.java:243)
        at mondrian.rolap.RolapResult.phase(RolapResult.java:563)
        at mondrian.rolap.RolapResult.executeBody(RolapResult.java:926)
        at mondrian.rolap.RolapResult.<init>(RolapResult.java:469)
        at
mondrian.rolap.RolapConnection.executeInternal(RolapConnection.java:671)
        at
mondrian.rolap.RolapConnection.access$000(RolapConnection.java:51)
        at mondrian.rolap.RolapConnection$1.call(RolapConnection.java:622)
        at mondrian.rolap.RolapConnection$1.call(RolapConnection.java:620)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

[3] "mondrian.rolap.agg.SegmentCacheManager$ACTOR" daemon prio=10
tid=0x00007fe53c15d000 nid=0x524e waiting on condition [0x00007fe50ffa5000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000005ecc0cac8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:322)
        at
mondrian.rolap.agg.SegmentCacheManager$Actor.run(SegmentCacheManager.java:967)
        at java.lang.Thread.run(Thread.java:662)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20140429/23752975/attachment-0001.html 


More information about the Mondrian mailing list