[Mondrian] schema loading performance

Wright, Jeff jeff.s.wright at truvenhealth.com
Mon Jan 26 14:48:25 EST 2015


The application I'm working with is based on Mondrian 3.6 and is seeing schema load times over 10 minutes, depending on database size. I'm seeing time spent mainly on two things...

1. Statistics provider for aggregate tables

The major time consuming step for schema load is calls to AggStar.FactTable.makeNumberOfRows. When I search JIRA for this, I see that MONDRIAN-918 added approxRowCount support for aggregate tables. Has there been any discussion of enhancing AggStar.FactTable to delegate to the statistics provider to get these counts?

I spent a couple minutes looking at the code, seems odd that AggStar.FactTable.getNumberOfRows() calls back to the star to get the numberOfRows, and then it makes its own "select count(*)" query in makeNumberOfRows().

2. Getting/validating dimension values

Much faster than that, but still over a minute, is time spent getting dimension values. The sql statements are either SqlMemberSource.getMemberChildren or SqlTupleReader.readTuples. We definitely have some calculated measures in the schema that involve tuple expressions, kind of like

    <CalculatedMember name="Dollar Amount Constrained by Some Value" visible="true" dimension="Measures">
        <Formula>([Measures].[Dollar Amount], [Some Dim].[All Some Dims].[Some Value])</Formula>
    </CalculatedMember>

I'm sure at least some of the dimension queries are connected to definitions like that. My next step is to debug through some of code when this is going on, but I thought while I was reporting on #1 I'd also fish for suggestions on #2...

--Jeff Wright

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20150126/110ba475/attachment.html 


More information about the Mondrian mailing list