[Mondrian] Inefficiencies with Lists

Matt Campbell mkambol at gmail.com
Mon Oct 26 10:41:10 EDT 2009


I just submitted a test case for the bug I mentioned (
http://jira.pentaho.com/browse/MONDRIAN-641).  The performance degradation
in this case was due to HighCardSqlTupleReader.  HCSTR uses a LinkedList (in
Target), unlike SqlTupleReader, which uses an ArrayList.  The list is not
accessed strictly sequentially in this case, so unnecessary time is spent
iterating through the LinkedList.

I've never understood why HCSTR gets used even in cases where there are no
dimensions flagged with the highCardinality attribute.  A possible fix is to
add a check for highCardinality in the block in SetEvaluator.execute() where
the appropriate TupleReader is instantiated.

On Sun, Oct 25, 2009 at 5:55 PM, Eric McDermid <mcdermid at stonecreek.com>wrote:

> It knocked a little over a minute from the time it takes to run on my
> machine, which in my case is a little better than a 3.5% increase.  I'm not
> sure that's statistically significant, though, since I didn't measure it
> over multiple runs or do anything to control for load.
>
> If we ever decide we need to make that test case even more obvious, we can
> add another crossjoin to bring it up to ~110K result rows rather than ~60k
> as now.  With the fix the query still completes in 5-6 seconds, and without
> the fix it's at least 15-20 minutes (I don't have an accurate number because
> I didn't actually have the patience to let it finish).  Of course, that kind
> of query might also be running into whatever the performance issue is Matt
> uncovered.
>
> Overall, the query I'm investigating runs 20-30% slower on 3.1.1 vs. 2.4,
> so I imagine I'll find some more issues like this before I'm done.
>
>  -- Eric
>
> On Oct 25, 2009, at 2:47 PM, Julian Hyde wrote:
>
>  That test case is perfect. If the bug ever appears again the test suite
> will be noitceably slower.
>
> We will be sure to include in 3.1.5. And of course 4.0.
>
> I'm interested how common this bug is. Did you see any improvement in the
> running time of the test suite as a whole?
>
> Julian
>
>  ------------------------------
> *From:* Eric McDermid [mailto:mcdermid at stonecreek.com<mcdermid at stonecreek.com>]
>
> *Sent:* Friday, October 23, 2009 12:03 PM
> *To:* jhyde at pentaho.com; Mondrian developer mailing list
> *Subject:* Re: [Mondrian] Inefficiencies with Lists
>
> OK, I just submitted the one I found as Mondrian-639.
>
> Changing the anonymous classes RolapNamedSetEvaluator to implement the
> Collection interface (largely just throwing UnsupportedOperationException
> since we don't need or want the modification functions) improves performance
> dramatically when count() is executed often, and still passes the rest of
> the regression suite.
>
> Absent any objections, I'll go ahead and check in both the new performance
> test (mentioned in the bug) and my associated fix in the same changelist.
>  On my machine the new test executes in 233 seconds as-is, compared to 4.5
> seconds when fixed.
>
> Unfortunately, it appears the issue Matt mentioned is a completely separate
> bug, and so isn't helped by this fix.
>
>  -- Eric
>
>  On Oct 22, 2009, at 2:43 PM, Julian Hyde wrote:
>
>  Good stuff.
>
> A good process to investigate and fix this stuff is to add a test case to
> PerformanceTest. If possible, devise it in such a way that performance is
> horrendously bad (e.g. takes a couple of minutes or more) if the performance
> bug is not fixed. Add notes to the test about how long it takes on your
> system, database etc. Then we can revisit in future releases and make sure
> that the test is still running OK.
>
>  PerformanceTest IS run as part of the suite, so add the test disabled
> until the bug is fixed.
>
> Unfortunately there are too many variables to devise automated tests that
> detect performance regressions (i.e. stuff that takes longer than it used
> to) but this is a reasonable manual process.
>
> Developers, if you have ideas for tests that may tickle a performance
> problem in the code -- generally a query with a larger than usual number of
> some thing, e.g. lots of axes, lots of calculated sets, lots of MemberGrants
> in a particular HierarchyGrant -- please add them to that suite. Even if the
> performance is acceptable today, it will ensure it never gets worse through
> a mistake we make in the future.
>
> Matt & Eric, can you each log a bug and add a test for the issues you have
> discovered. I will comment on those issues.
>
> Julian
>
>  ------------------------------
> *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org<mondrian-bounces at pentaho.org>]
> *On Behalf Of *Eric McDermid
> *Sent:* Thursday, October 22, 2009 1:30 PM
> *To:* Mondrian developer mailing list
> *Subject:* Re: [Mondrian] Inefficiencies with Lists
>
> I haven't investigated enough to know for sure it's related, but this
> sounds like something I've seen recently investigating slowdowns introduced
> from 2.4 to 3.1.1.
>
> In particular, there's a serious inefficiency when an instance of one of
> the RolapNamedSetEvaluator anonymous Iterable classes gets handed to
> mondrian/olap.fun.FunUtil.count().  The implementation of count (which
> hasn't changed) checks to see if the iterable passed in is an instance of
> Collection.  If so, it returns the result of size(); if not, it iterates
> through the iterator each time to get a count.
>
> Unfortunately, neither of the inner classes implements Collection, despite
> the fact that they both work internally from a List, so we get a lot of
> unnecessary iterations.  In 2.4, the object passed in was actually a List,
> so this wasn't an issue.
>
> For reference, The query I'm looking at has ~30K items being counted, and
> FunUtil.count() executes 46 times.  In 2.4, that had a cumulative hit of
> less than half a second.  In 3.1.1, it's more like 77 seconds.
>
> I think the right thing to do here is extend the internal classes
> of RolapNamedSetEvaluator to implement Collection (which itself extends
> Iterable) rather than Iterable directly.  Since it's implemented with a list
> internally, this should be pretty easy.
>
> I'm working on fixing this today, and I'll point you to the changelist once
> I'm done in case you want to try it yourself.
>
>  -- Eric
>
>  On Oct 22, 2009, at 9:25 AM, Matt Campbell wrote:
>
>
> The simple crossjoin query in the unit test below runs in about *99
> seconds* on my laptop.  After noticing a hotspot in JProfiler, I tried
> setting the result style to ResultStyle.LIST and reran.  It completed in *11
> seconds*.  I haven't investigated much yet, but I thought I'd throw this
> observation out for feedback.  It looks like the LinkedList is not being
> accessed sequentially, causing a significant bottleneck in list retrieval.
>
>
>  public void testLargeResult() {
>          long start = System.currentTimeMillis();
>          Result result = executeQuery("select {  crossjoin(
> customers.[city].members, " +
>                  "crossjoin( [store type].[store type].members,
> product.[product name].members)) }" +
>                  " on 0 from sales");
>          System.out.println("elapsed time:  " + (System.currentTimeMillis()
> - start));
>          System.out.println("size=" +
> result.getAxes()[0].getPositions().size());
>      }
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20091026/a66255ec/attachment.html 


More information about the Mondrian mailing list