[Mondrian] schema loading performance

Matt Campbell mcampbell at pentaho.com
Wed Jan 28 11:18:30 EST 2015


Hey Jeff-
Not sure if this will help you in any way, but for #2, one nuance with resolution of members is that references that specify the unique name (like your [Some Dim].[All Some Dims].[Some Value]) will result in a retrieval of that single member during schema load, whereas a reference to the member within a level, like [Some Dim].[Some Level].[Some Member], will result in all members of the level being loaded and cached.  The upshot of this is that if you have many references to single members of the same level, you can eliminate separate SQL queries in favor of a single SQL query which caches all of the members in one shot.  This is dependent on the level having unique members, though.


From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Wright, Jeff
Sent: Monday, January 26, 2015 5:42 PM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] schema loading performance

A little bit more on #1...

I did more tests with approximate row counts for the aggregate tables. Instead of getting a big speedup in schema load, what I saw was just a reduction of log messages about AggStar.FactTable.makeNumberOfRows. It just paused for minutes with no logging. My assumption is that there is other work related to aggregate tables that is slow and is not logged. My next idea is to turn on debug logging and see if I get any clues about where the time is being spent.

--jeff

From: mondrian-bounces at pentaho.org<mailto:mondrian-bounces at pentaho.org> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Wright, Jeff
Sent: Monday, January 26, 2015 2:48 PM
To: Mondrian developer mailing list
Subject: [Mondrian] schema loading performance

The application I'm working with is based on Mondrian 3.6 and is seeing schema load times over 10 minutes, depending on database size. I'm seeing time spent mainly on two things...

1. Statistics provider for aggregate tables

The major time consuming step for schema load is calls to AggStar.FactTable.makeNumberOfRows. When I search JIRA for this, I see that MONDRIAN-918 added approxRowCount support for aggregate tables. Has there been any discussion of enhancing AggStar.FactTable to delegate to the statistics provider to get these counts?

I spent a couple minutes looking at the code, seems odd that AggStar.FactTable.getNumberOfRows() calls back to the star to get the numberOfRows, and then it makes its own "select count(*)" query in makeNumberOfRows().

2. Getting/validating dimension values

Much faster than that, but still over a minute, is time spent getting dimension values. The sql statements are either SqlMemberSource.getMemberChildren or SqlTupleReader.readTuples. We definitely have some calculated measures in the schema that involve tuple expressions, kind of like

    <CalculatedMember name="Dollar Amount Constrained by Some Value" visible="true" dimension="Measures">
        <Formula>([Measures].[Dollar Amount], [Some Dim].[All Some Dims].[Some Value])</Formula>
    </CalculatedMember>

I'm sure at least some of the dimension queries are connected to definitions like that. My next step is to debug through some of code when this is going on, but I thought while I was reporting on #1 I'd also fish for suggestions on #2...

--Jeff Wright

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20150128/c025064b/attachment.html 


More information about the Mondrian mailing list