[Mondrian] schema loading performance

Julian Hyde julianhyde at gmail.com
Tue Jan 27 02:18:36 EST 2015


I’d generate a few thread dumps using ‘kill -3’ during the long load process. From where the code is, you might be able to deduce what is taking the time. (Of course there will be many threads that are just paused because they are almost always paused. You should look at the “active” thread, which is probably the one loading the schema.)

> On Jan 26, 2015, at 2:41 PM, Wright, Jeff <jeff.s.wright at truvenhealth.com> wrote:
> 
> A little bit more on #1…
>  
> I did more tests with approximate row counts for the aggregate tables. Instead of getting a big speedup in schema load, what I saw was just a reduction of log messages about AggStar.FactTable.makeNumberOfRows. It just paused for minutes with no logging. My assumption is that there is other work related to aggregate tables that is slow and is not logged. My next idea is to turn on debug logging and see if I get any clues about where the time is being spent.
>  
> --jeff
>  
> From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Wright, Jeff
> Sent: Monday, January 26, 2015 2:48 PM
> To: Mondrian developer mailing list
> Subject: [Mondrian] schema loading performance
>  
> The application I’m working with is based on Mondrian 3.6 and is seeing schema load times over 10 minutes, depending on database size. I’m seeing time spent mainly on two things…
>  
> 1. Statistics provider for aggregate tables
>  
> The major time consuming step for schema load is calls to AggStar.FactTable.makeNumberOfRows. When I search JIRA for this, I see that MONDRIAN-918 added approxRowCount support for aggregate tables. Has there been any discussion of enhancing AggStar.FactTable to delegate to the statistics provider to get these counts?
>  
> I spent a couple minutes looking at the code, seems odd that AggStar.FactTable.getNumberOfRows() calls back to the star to get the numberOfRows, and then it makes its own “select count(*)” query in makeNumberOfRows().
>  
> 2. Getting/validating dimension values
>  
> Much faster than that, but still over a minute, is time spent getting dimension values. The sql statements are either SqlMemberSource.getMemberChildren or SqlTupleReader.readTuples. We definitely have some calculated measures in the schema that involve tuple expressions, kind of like
>  
>     <CalculatedMember name="Dollar Amount Constrained by Some Value" visible="true" dimension="Measures">
>         <Formula>([Measures].[Dollar Amount], [Some Dim].[All Some Dims].[Some Value])</Formula>
>     </CalculatedMember>
>  
> I’m sure at least some of the dimension queries are connected to definitions like that. My next step is to debug through some of code when this is going on, but I thought while I was reporting on #1 I’d also fish for suggestions on #2…
>  
> --Jeff Wright
>  
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20150126/ebb8fed5/attachment.html 


More information about the Mondrian mailing list