[Mondrian] Inefficiencies with Lists

Julian Hyde jhyde at pentaho.com
Sun Oct 25 16:47:09 EDT 2009


That test case is perfect. If the bug ever appears again the test suite will
be noitceably slower.
 
We will be sure to include in 3.1.5. And of course 4.0.
 
I'm interested how common this bug is. Did you see any improvement in the
running time of the test suite as a whole?
 
Julian


  _____  

From: Eric McDermid [mailto:mcdermid at stonecreek.com] 
Sent: Friday, October 23, 2009 12:03 PM
To: jhyde at pentaho.com; Mondrian developer mailing list
Subject: Re: [Mondrian] Inefficiencies with Lists


OK, I just submitted the one I found as Mondrian-639. 

Changing the anonymous classes RolapNamedSetEvaluator to implement the
Collection interface (largely just throwing UnsupportedOperationException
since we don't need or want the modification functions) improves performance
dramatically when count() is executed often, and still passes the rest of
the regression suite.  

Absent any objections, I'll go ahead and check in both the new performance
test (mentioned in the bug) and my associated fix in the same changelist.
On my machine the new test executes in 233 seconds as-is, compared to 4.5
seconds when fixed.

Unfortunately, it appears the issue Matt mentioned is a completely separate
bug, and so isn't helped by this fix.

 -- Eric

On Oct 22, 2009, at 2:43 PM, Julian Hyde wrote:


Good stuff.
 
A good process to investigate and fix this stuff is to add a test case to
PerformanceTest. If possible, devise it in such a way that performance is
horrendously bad (e.g. takes a couple of minutes or more) if the performance
bug is not fixed. Add notes to the test about how long it takes on your
system, database etc. Then we can revisit in future releases and make sure
that the test is still running OK.
 

PerformanceTest IS run as part of the suite, so add the test disabled until
the bug is fixed.
 
Unfortunately there are too many variables to devise automated tests that
detect performance regressions (i.e. stuff that takes longer than it used
to) but this is a reasonable manual process.
 
Developers, if you have ideas for tests that may tickle a performance
problem in the code -- generally a query with a larger than usual number of
some thing, e.g. lots of axes, lots of calculated sets, lots of MemberGrants
in a particular HierarchyGrant -- please add them to that suite. Even if the
performance is acceptable today, it will ensure it never gets worse through
a mistake we make in the future.
 
Matt & Eric, can you each log a bug and add a test for the issues you have
discovered. I will comment on those issues.
 
Julian


  _____  

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of Eric McDermid
Sent: Thursday, October 22, 2009 1:30 PM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] Inefficiencies with Lists


I haven't investigated enough to know for sure it's related, but this sounds
like something I've seen recently investigating slowdowns introduced from
2.4 to 3.1.1. 

In particular, there's a serious inefficiency when an instance of one of the
RolapNamedSetEvaluator anonymous Iterable classes gets handed to
mondrian/olap.fun.FunUtil.count().  The implementation of count (which
hasn't changed) checks to see if the iterable passed in is an instance of
Collection.  If so, it returns the result of size(); if not, it iterates
through the iterator each time to get a count.

Unfortunately, neither of the inner classes implements Collection, despite
the fact that they both work internally from a List, so we get a lot of
unnecessary iterations.  In 2.4, the object passed in was actually a List,
so this wasn't an issue.

For reference, The query I'm looking at has ~30K items being counted, and
FunUtil.count() executes 46 times.  In 2.4, that had a cumulative hit of
less than half a second.  In 3.1.1, it's more like 77 seconds.

I think the right thing to do here is extend the internal classes of
RolapNamedSetEvaluator to implement Collection (which itself extends
Iterable) rather than Iterable directly.  Since it's implemented with a list
internally, this should be pretty easy. 

I'm working on fixing this today, and I'll point you to the changelist once
I'm done in case you want to try it yourself.

 -- Eric

On Oct 22, 2009, at 9:25 AM, Matt Campbell wrote:



The simple crossjoin query in the unit test below runs in about 99 seconds
on my laptop.  After noticing a hotspot in JProfiler, I tried setting the
result style to ResultStyle.LIST and reran.  It completed in 11 seconds.  I
haven't investigated much yet, but I thought I'd throw this observation out
for feedback.  It looks like the LinkedList is not being accessed
sequentially, causing a significant bottleneck in list retrieval.


 public void testLargeResult() {
         long start = System.currentTimeMillis();
         Result result = executeQuery("select {  crossjoin(
customers.[city].members, " +
                 "crossjoin( [store type].[store type].members,
product.[product name].members)) }" +
                 " on 0 from sales");
         System.out.println("elapsed time:  " + (System.currentTimeMillis()
- start));
         System.out.println("size=" +
result.getAxes()[0].getPositions().size());
     }
_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian



_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20091025/60793043/attachment.html 


More information about the Mondrian mailing list