[Mondrian] improving schema load concurrency

Tom Barber tom at analytical-labs.com
Fri Jul 15 14:34:49 EDT 2016


Is it validation? I thought it was member caching?

--------------

Director Meteorite.bi - Saiku Analytics Founder
Tel: +44(0)5603641316

(Thanks to the Saiku community we reached our Kickstart
<http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
goal, but you can always help by sponsoring the project
<http://www.meteorite.bi/products/saiku/sponsorship>)

On 15 July 2016 at 19:32, Dan <dan at dankeeley.co.uk> wrote:

> To add to that we've seen some schemas take minutes to load and it's
> exactly as you describe. Rdbms validation queries taking a few seconds each
> but many of them.
>
> When you have 100s of calculated members this takes ages.
>
> Thing is the validation makes no sense. It doesn't validate all db
> references only some.  Just because a schema is successfully loaded doesn't
> mean it's valid.
>
> We saw these issues at one of the biggest pentaho customers in the UK.
> Albeit not on the latest version.
>
> Dan
>
> Sent from my phone
> On 15 Jul 2016 7:11 p.m., "Wright, Jeff" <jeff.s.wright at truvenhealth.com>
> wrote:
>
> The idea of the future wasn’t to do away with the actor, but to use a
> future to keep the actor thread from waiting on the schema load – your 2nd
> approach. But that’s kind of gravy. From a user perspective, the real
> reason for this future idea is allow more than one schema to load in
> parallel. We have servers with over 100 schemas.
>
>
>
> The things I’m aware of that contribute to why it takes 60 sec to multiple
> minutes to load schemas are
>
> -          the time to do DBMS queries to validate member references in
> the schema
>
> -          the time to do aggregate matching (even with
> mondrian.rolap.aggregates.Read=false)
>
>
>
> --jeff
>
>
>
> *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
> *On Behalf Of *Julian Hyde
> *Sent:* Friday, July 15, 2016 2:01 PM
> *To:* Mondrian mailing list <mondrian at pentaho.org>
> *Subject:* Re: [Mondrian] improving schema load concurrency
>
>
>
> Can you describe why schemas are taking so long to load? Maybe we can
> defer some parts of the load process.
>
>
>
> Replacing an actor (which single-threads critical processing) with a
> parallel process is never going to be straightforward.
>
>
>
> Another approach is to keep the actor but reduce the amount of processing
> that occurs in the actor. Which may end up looking like the suggestion I
> made in the first paragraph.
>
>
>
> Julian
>
>
>
> On Jul 15, 2016, at 10:51 AM, Wright, Jeff <jeff.s.wright at truvenhealth.com>
> wrote:
>
>
>
> When we were working on our recent deadlock issue, we had initially hoped
> that the change last year in RolapSchemaPool to use a
> ReentrantReadWriteLock might fix that problem. Although it turned out to
> require a different fix, my colleague Jon Rand that was working on this
> came up with an idea about how to possibly allow even more concurrency with
> schema loading.
>
>
>
> The basic proposition is to have the two maps that maintain the schema
> pool store <Future<RolapSchema>> objects rather than storing completed
> <RolapSchema> objects.  The Future can be added and the lock released
> before the schema even begins to load.
>
>
>
> Jon actually did prototypes of 2 different approaches, here’s how he
> described them:
>
>
>
> 1.       Schema loading occurs in a thread managed by an
> ExecutorService.  This makes it easy to control the maximum number of
> schemas that can load concurrently simply by limiting the size of the
> thread pool managed by the ExecutorService.  I’ve run a few reports with
> this version but no extensive testing has been performed.
>
> 2.       Schema loading occurs in the thread that’s calling
> RolapSchemaPool.get() using a FutureTask.  I haven’t had a chance to test
> this version.
>
>
>
> One thing that appeals to me a lot about this is I think this would also
> reduce some issues with the SegmentCacheManager$ACTOR thread running schema
> loads, since this can be a long running process (minutes), and holds up the
> SegmentCacheManager.
>
>
>
> Any thoughts on this, would this be worth capturing in JIRA? I’m not sure
> I can get anybody from our team to work on this in the short term, but I
> wanted to pass along the idea…
>
>
>
> --jeff
>
>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20160715/ba1f7ee1/attachment-0001.html 


More information about the Mondrian mailing list