[Mondrian] Re: Performance degradation

Julian Hyde jhyde at pentaho.com
Thu Mar 5 14:20:01 EST 2009


As usual, my reaction to this code is 'yuck'.
 
I think your change is sound, and you should go ahead.
 
A few other improvements to consider:
 
1. To distinguish between a tuple-set and a member-set, see if you can use
the type derived by the validator rather than the runtime 'instanceof'
check.
 
2. The code seems to be detecting empty lists by doing a get(0) then
handling the IndexOutOfBoundsException. I believe that the person who wrote
the high-cardinalty dimension code did this because size() can be an
expensive operation; but they may think that try-catch is more elegant than
if, I don't know. It's not possible to make size() a cheap operation, but we
can make isEmpty() efficient. Can you see whether it's possible to replace
it with an isEmpty check?
 
3. Even though I don't like the high-cardinality dimension code as a whole,
ConcatenableList is basically sound, and worth saving. The change you
proposed earlier, to consolidate a the sublists of ConcatenableList into a
single array on the first call to size or get, damages the performance
characteristics of that class. For instance, it is not worth consolidating a
list that has two sublists of size 500K, but it is worth consolidating a
list that has 100K lists of size 10. In particular, your approach would give
O(n^2) performance if someone adds one element at a time onto the end of a
list.
 
This looks like a better algorithm: consolidate only if the number of
sublists is greater than the square root of the number of elements.
 
Julian


  _____  

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of Matt Campbell
Sent: Thursday, March 05, 2009 10:30 AM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] Re: Performance degradation



After spending some more time looking at this issue I think the best
solution (at least short term), is to fix the method SetFunDef.evaluateList
(pasted below).  I believe what it's intended to do is load an ArrayList if
the dimension in the set is not flagged as highCardinality.  It fails in
certain cases, though, since the result object may be a list of lists when
there are multi-element tuples.

I think a reasonable modification would be to assume the set is NOT
highCardinality, unless there is only a single dimension in result. This
would cause SetFunDef to use an ArrayList in such cases, which was the
behavior prior to the high cardinality changes.

Any objections to such a change?  

(note that the method below is as it is currently checked in)

public List evaluateList(final Evaluator evaluator) {
            this.result = new ConcatenableList();
            for (VoidCalc voidCalc : voidCalcs) {
                voidCalc.evaluateVoid(evaluator);
            }
            // REVIEW: What the heck is this code doing? Why is it OK to
            // ignore IndexOutOfBoundsException and NullPointerException?
            try {
                if (result.get(0) instanceof Member) {
                    if (!((Member)result.get(0)).getDimension()
                            .isHighCardinality()) {
                        result.toArray();
                    }
                }
            } catch (IndexOutOfBoundsException iooe) {
            } catch (NullPointerException npe) {
            }
            return result;
        }


On Wed, Mar 4, 2009 at 4:16 PM, Matt Campbell <mkambol at gmail.com> wrote:


Virtually all the time spent on the query is spent in ConcatenableList, in
the size() and get() methods.  Both have some pretty inefficient looking
logic.  In get(), if the element retrieved is not the next, previous, or
pre-previous, then the code iterates through the whole list up to the
elements index.  Similarly with .size(), the code iterates through the
lists, adding up each lists size.  I'm guessing that in many cases the
expense of doing this once is probably about the cost of converting to an
ArrayList.  As an experiment I modified .size() to always convert to an
ArrayList, and this cut the time to execute dramatically. 


On Wed, Mar 4, 2009 at 12:02 PM, Matt Campbell <mkambol at gmail.com> wrote:


After syncing to a changelist I was still seeing lots of files in my
directories that were showing version 0 in perforce.  I eventually got past
this by starting from a totally clean slate and syncing. 

Still investigating, but it looks like changelist 11049 is the culprit.  The
query I posted runs in 2.5 seconds with change 11048, and takes a full
minute and a half with 11049. 


On Wed, Mar 4, 2009 at 11:16 AM, Julian Hyde <jhyde at pentaho.com> wrote:


There is no version 0 as such. If p4 sync 'gets' version 0, it removes the
file from your sandbox. Is that what you are seeing? 


On Mar 4, 2009, at 6:36 AM, Matt Campbell <mkambol at gmail.com> wrote:



Hmmm.  I see many files in my directories that are marked as having version
0.  For example, I see MemberIterCalc.java in my sandbox, even though it was
added much after the changelist I synched to.  Any ideas how this could
happen?


On Wed, Mar 4, 2009 at 9:13 AM, John V. Sichi < <mailto:jsichi at gmail.com>
jsichi at gmail.com> wrote:


Matt Campbell wrote:


I've been trying to isolate when this issue was introduced, but am
struggling a bit with perforce.  I'm trying to sync to a changelist using
something like:

p4 sync //open/mondrian/... at 10480

This brings in the set of files from that changelist, but does not appear to
delete files from future changelists.  
What is the correct way to sync to a changelist?



That is the correct command.  It will delete files from future changelists,
but it will not delete empty directories unless you have the rmdir option
set in your client (the default is normdir).

(Also, use p4 opened to make sure you don't have any files checked out
before starting.)

JVS

_______________________________________________
Mondrian mailing list
 <mailto:Mondrian at pentaho.org> Mondrian at pentaho.org
 <http://lists.pentaho.org/mailman/listinfo/mondrian>
http://lists.pentaho.org/mailman/listinfo/mondrian



_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian



_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian






-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20090305/cfa6daa3/attachment.html 


More information about the Mondrian mailing list