[Mondrian] Re: Optimizing Sync Blocks in Aggregation class

Thiyagu Palanisamy tpalanis at thoughtworks.com
Thu May 31 08:42:27 EDT 2007


We have done some more improvements to our Approach 2 : Move the lock from 
Aggregation to Segment wherever it is possible (CR 9366) 

New Changes:
1. Synchronized block removed from AggregationManager.getCellFromCache()
The block was covering call to Aggregation.getCellValue() and 
synchronization is handled with in Aggregation.getCellValue() method 
itself.
2. Synchronized block removed from Aggregation.getCellValue()
This method was earlier synchronized to ensure that access to segmentRefs
is thread safe. Now we have made segmentRefs as CopyOnWriteArrayList 
instance instead of ArrayList. This frees us up from handling the 
Synchronized access to this list. We have selected CopyOnWriteArrayList 
because <quote java doc> it is more efficient than alternatives when 
traversal operations vastly outnumber mutations</quote java doc>, which is 
the case in our scenario, there will be lot of get cellvalue calls but 
only few times we will be modifying segments in the list.
3. Made Segment.getCellValue() Synchronized
        As we have removed synchronization from Aggregation we are 
synchronizing at Segment level.

We have uploaded the latest change to 
http://forums.pentaho.org/showthread.php?t=54274
We also ran Jprofiler against all the 3 versions(Current implementation, 
Approach1, Approach2) and the reports are attached to the same post. We 
could see that lock contention is drastically reduced for data load path 
(Aggregation object).

Thanks,
Thiyagu & Tushar




Thiyagu Palanisamy/India/ThoughtWorks 
30/05/2007 20:29

To
Mondrian
cc

Subject
Optimizing Sync Blocks in Aggregation class





Hi,
 
We are proposing optimization of synchronization blocks involved in 
loading of data from DB.
 
Current system is synchronized on Aggregation object while their segments 
are loaded from DB. We did analysis using JProfiler to identify the 
maximum lock contentions.
 
Issues with the current implementation:
1. Because of the lock at Aggregation, requests are forced to wait even 
though they are not needed. Eg., If any previous request is loading a new 
segment to Aggregation all the requests that need the same aggregation 
will be blocked regardless of segment data availability. 
2. Aggregation is locked for entire period of creating segment, firing 
sql, loading result set back on to segments. So aggregation is locked when 
DB is processing data.
 
We got 2 different working solutions to address this issue
 
Approach 1: Keep locking on Aggregation itself, but optimization the 
duration lock (CR 9325)
Changes made for this solution:
1. Syncronized block removed from AggregationManager.load()
    Because this blocks covers call to Aggregation.optimizePredicates(), 
Aggregation.load() but both these methods are already Synchronized. 
2. Made Aggregation.optimizePredicates() non synchronized
    Uses maxConstraints,star fields of aggregation instance but they are 
read only and set only in constructor. 
    Parameters to this method
         columns : created at CellRequest, it is read only access in the 
method. This is related Aggregation object but columns list will remain 
same for the life time of the Aggregation class.
         predicates : created at FBCR which is unique to a single request, 
it is also read only and has no association to Aggregation object. 
3. Made Aggregation.load() non synchronized
    Uses constrainedColumnsBitKey,segmentRefs fields of aggregation 
instance.
          constrainedColumnsBitKey : This is read only and set in 
constructor
          segmentRefs : We are still synchronizing on aggregation when 
accessing this
     Parameters to this method
          columns : created at CellRequest, this is set to columns 
instance variable. columns list will remain same for the life time of the 
Aggregation class so we have made change to set only at the first time 
load is called on aggregation. This will happen only during creation of 
Aggregation instance and it will be threadlocal at that point. 
          predicates : has no association to Aggregation object
          pinnedSegments : has no association to Aggregation object
          Measures: has no association to Aggregation object
4. Changes in Aggregation.load() 
    New Segments are added to segmentRefs only after they are loaded with 
the data. This will ensure that none of the Segments are exposed to other 
threads when they are loading. 
Advantages with this approach
    1. Aggregation object is locked only during the time of adding loaded 
Segments to segmentRefs.
    2. This approach requires only minor modifications to the existing 
locking strategy
Disadvantage with this approach
    If more than one request comes at the same time for a same new segment 
of an existing aggregation, both the requests will fire query and load the 
segment data. This issue is already exists in the current version too, but 
window in which this can happen is small, this will happen only until one 
of the request reaches the AggregationManager.load(). With the new change 
this will happen until segment is set to AggregationManager.segmentRefs.
Approach 2: Move the lock from Aggregation to Segment wherever it is 
possible (CR 9366) 
    1. Synchronized block removed from AggregationManager.load() : reason 
same as approach 1
    2. Made Aggregation.optimizePredicates() non synchronized : reason 
same as Approach 1
    3. Made Aggregation.load() non synchronized : reason same as Approach 
1
    4. Changes in Aggregation.load() 
        Aggregation object is altered only when New Segments are added to 
segmentRefs, so we have Sync block covering add to segmentRefs     
    5. Made Aggregation.getCellValue() non synchronized 
        Only access to segmentRefs needs to be Synchronized, introduced 
sync block to cover access to segmentRefs and when segment is not ready 
and segment would contain the requested value(which means another thread 
is loading the segment) current request will be blocked until segment is 
loaded. And return value once segment is ready. If segment fails to load 
it will return null similar to when segment is not found. 
Advantage with this approach
    1.Aggregation object is locked only during the time of adding Segments 
to segmentRefs. 
    2. Concurrent requests will be blocked only if both the requests are 
for same segment which is loading. This is a significant improvement from 
Approach 1, because this reduces number of blocking requests. 
Disadvantage with this approach
    1.Locking strategy has been changed to part on Aggregation and part on 
Segment. This also means that we need more testing.
    2.We are using segment state for synchronization, this piece of code 
already exists but not used currently.
We have attached both our implementations to the forum post: 
http://forums.pentaho.org/showthread.php?t=54274
ConcurrentMDXTest class is to detected lock issues, this test is skewed 
towards simulating concurrent access of same aggregation, this is not 
valid for performance comparison of different implementations. 

Please give your inputs on which approach we should move forward and 
issues or improvements we can make on to that approach. Also it would be 
of great help if you can give us some more scenarios for concurrent 
testing.
Thanks,
Thiyagu, Tushar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070531/5a47e5db/attachment.html 


More information about the Mondrian mailing list