[Mondrian] improving schema load concurrency

Julian Hyde julianhyde at gmail.com
Fri Jul 15 15:07:00 EDT 2016


IIRC, validation isn’t the intention (Mondrian wants to reference actual valid Member objects in its definition of of calculated members etc.) but it is the effect (if something is invalid, schema load fails).

Maybe it’s possible to load quickly, then kick off a background thread to validate. The schema will be usable before validation completes. Things like calculated members can be loaded on first use using a pattern like memorization[1]. So, the computation happens at most once.

The background thread should be low priority, to avoid swamping the database.

Julian

[1] https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Suppliers.html#memoize(com.google.common.base.Supplier) <https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Suppliers.html#memoize(com.google.common.base.Supplier)>

> On Jul 15, 2016, at 11:48 AM, Tom Barber <tom at analytical-labs.com> wrote:
> 
> We in 4.x is uses the database metadata to construct the schema, so I'm pretty sure that it doesn't validate on top. I thought it checked members and picked out info for query optimisation, but i've never looked that much :)
> 
> --------------
> 
> Director Meteorite.bi - Saiku Analytics Founder
> Tel: +44(0)5603641316  
> 
> (Thanks to the Saiku community we reached our Kickstart <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/> goal, but you can always help by sponsoring the project <http://www.meteorite.bi/products/saiku/sponsorship>)
> 
> On 15 July 2016 at 19:46, Dan <dan at dankeeley.co.uk <mailto:dan at dankeeley.co.uk>> wrote:
> I'm pretty sure it's not caching. It doesn't do it for straight forward dimensions.
> 
> Also your first mdx execution is never cached you can tell this by the plethora of queries that run first time.
> 
> Another problem with this is where you design your fact table partitioned. You make sure only one partition normally gets hit. But when these queries run it hits the whole lot.
> 
> Would be happy if it was controllable and clear exactly what was actually going on.
> 
> Sent from my phone
> 
> On 15 Jul 2016 7:35 p.m., "Tom Barber" <tom at analytical-labs.com <mailto:tom at analytical-labs.com>> wrote:
> Is it validation? I thought it was member caching?
> 
> --------------
> 
> Director Meteorite.bi - Saiku Analytics Founder
> Tel: +44(0)5603641316 <tel:%2B44%280%295603641316>  
> 
> (Thanks to the Saiku community we reached our Kickstart <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/> goal, but you can always help by sponsoring the project <http://www.meteorite.bi/products/saiku/sponsorship>)
> 
> On 15 July 2016 at 19:32, Dan <dan at dankeeley.co.uk <mailto:dan at dankeeley.co.uk>> wrote:
> To add to that we've seen some schemas take minutes to load and it's exactly as you describe. Rdbms validation queries taking a few seconds each but many of them.
> 
> When you have 100s of calculated members this takes ages.
> 
> Thing is the validation makes no sense. It doesn't validate all db references only some.  Just because a schema is successfully loaded doesn't mean it's valid.
> 
> We saw these issues at one of the biggest pentaho customers in the UK. Albeit not on the latest version.
> 
> Dan
> 
> Sent from my phone
> 
> On 15 Jul 2016 7:11 p.m., "Wright, Jeff" <jeff.s.wright at truvenhealth.com <mailto:jeff.s.wright at truvenhealth.com>> wrote:
> The idea of the future wasn’t to do away with the actor, but to use a future to keep the actor thread from waiting on the schema load – your 2nd approach. But that’s kind of gravy. From a user perspective, the real reason for this future idea is allow more than one schema to load in parallel. We have servers with over 100 schemas.
> 
>  
> 
> The things I’m aware of that contribute to why it takes 60 sec to multiple minutes to load schemas are
> 
> -          the time to do DBMS queries to validate member references in the schema
> 
> -          the time to do aggregate matching (even with mondrian.rolap.aggregates.Read=false)
> 
>  
> 
> --jeff
> 
>  
> 
> From: mondrian-bounces at pentaho.org <mailto:mondrian-bounces at pentaho.org> [mailto:mondrian-bounces at pentaho.org <mailto:mondrian-bounces at pentaho.org>] On Behalf Of Julian Hyde
> Sent: Friday, July 15, 2016 2:01 PM
> To: Mondrian mailing list <mondrian at pentaho.org <mailto:mondrian at pentaho.org>>
> Subject: Re: [Mondrian] improving schema load concurrency
> 
>  
> 
> Can you describe why schemas are taking so long to load? Maybe we can defer some parts of the load process.
> 
>  
> 
> Replacing an actor (which single-threads critical processing) with a parallel process is never going to be straightforward.
> 
>  
> 
> Another approach is to keep the actor but reduce the amount of processing that occurs in the actor. Which may end up looking like the suggestion I made in the first paragraph.
> 
>  
> 
> Julian
> 
>  
> 
> On Jul 15, 2016, at 10:51 AM, Wright, Jeff <jeff.s.wright at truvenhealth.com <mailto:jeff.s.wright at truvenhealth.com>> wrote:
> 
>  
> 
> When we were working on our recent deadlock issue, we had initially hoped that the change last year in RolapSchemaPool to use a ReentrantReadWriteLock might fix that problem. Although it turned out to require a different fix, my colleague Jon Rand that was working on this came up with an idea about how to possibly allow even more concurrency with schema loading.
> 
>  
> 
> The basic proposition is to have the two maps that maintain the schema pool store <Future<RolapSchema>> objects rather than storing completed <RolapSchema> objects.  The Future can be added and the lock released before the schema even begins to load.
> 
>  
> 
> Jon actually did prototypes of 2 different approaches, here’s how he described them:
> 
>  
> 
> 1.       Schema loading occurs in a thread managed by an ExecutorService.  This makes it easy to control the maximum number of schemas that can load concurrently simply by limiting the size of the thread pool managed by the ExecutorService.  I’ve run a few reports with this version but no extensive testing has been performed.
> 
> 2.       Schema loading occurs in the thread that’s calling RolapSchemaPool.get() using a FutureTask.  I haven’t had a chance to test this version.
> 
>  
> 
> One thing that appeals to me a lot about this is I think this would also reduce some issues with the SegmentCacheManager$ACTOR thread running schema loads, since this can be a long running process (minutes), and holds up the SegmentCacheManager.
> 
>  
> 
> Any thoughts on this, would this be worth capturing in JIRA? I’m not sure I can get anybody from our team to work on this in the short term, but I wanted to pass along the idea…
> 
>  
> 
> --jeff
> 
>  
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org <mailto:Mondrian at pentaho.org>
> http://lists.pentaho.org/mailman/listinfo/mondrian <http://lists.pentaho.org/mailman/listinfo/mondrian>
>  
> 
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org <mailto:Mondrian at pentaho.org>
> http://lists.pentaho.org/mailman/listinfo/mondrian <http://lists.pentaho.org/mailman/listinfo/mondrian>
> 
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org <mailto:Mondrian at pentaho.org>
> http://lists.pentaho.org/mailman/listinfo/mondrian <http://lists.pentaho.org/mailman/listinfo/mondrian>
> 
> 
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org <mailto:Mondrian at pentaho.org>
> http://lists.pentaho.org/mailman/listinfo/mondrian <http://lists.pentaho.org/mailman/listinfo/mondrian>
> 
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org <mailto:Mondrian at pentaho.org>
> http://lists.pentaho.org/mailman/listinfo/mondrian <http://lists.pentaho.org/mailman/listinfo/mondrian>
> 
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20160715/85ebe14c/attachment-0001.html 


More information about the Mondrian mailing list