[Mondrian] schema loading performance

Wright, Jeff jeff.s.wright at truvenhealth.com
Mon Jan 26 17:41:41 EST 2015

A little bit more on #1...

I did more tests with approximate row counts for the aggregate tables. Instead of getting a big speedup in schema load, what I saw was just a reduction of log messages about AggStar.FactTable.makeNumberOfRows. It just paused for minutes with no logging. My assumption is that there is other work related to aggregate tables that is slow and is not logged. My next idea is to turn on debug logging and see if I get any clues about where the time is being spent.


From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Wright, Jeff
Sent: Monday, January 26, 2015 2:48 PM
To: Mondrian developer mailing list
Subject: [Mondrian] schema loading performance

The application I'm working with is based on Mondrian 3.6 and is seeing schema load times over 10 minutes, depending on database size. I'm seeing time spent mainly on two things...

1. Statistics provider for aggregate tables

The major time consuming step for schema load is calls to AggStar.FactTable.makeNumberOfRows. When I search JIRA for this, I see that MONDRIAN-918 added approxRowCount support for aggregate tables. Has there been any discussion of enhancing AggStar.FactTable to delegate to the statistics provider to get these counts?

I spent a couple minutes looking at the code, seems odd that AggStar.FactTable.getNumberOfRows() calls back to the star to get the numberOfRows, and then it makes its own "select count(*)" query in makeNumberOfRows().

2. Getting/validating dimension values

Much faster than that, but still over a minute, is time spent getting dimension values. The sql statements are either SqlMemberSource.getMemberChildren or SqlTupleReader.readTuples. We definitely have some calculated measures in the schema that involve tuple expressions, kind of like

    <CalculatedMember name="Dollar Amount Constrained by Some Value" visible="true" dimension="Measures">
        <Formula>([Measures].[Dollar Amount], [Some Dim].[All Some Dims].[Some Value])</Formula>

I'm sure at least some of the dimension queries are connected to definitions like that. My next step is to debug through some of code when this is going on, but I thought while I was reporting on #1 I'd also fish for suggestions on #2...

--Jeff Wright

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20150126/a91e68b6/attachment.html 

More information about the Mondrian mailing list