[Mondrian] Re: Performance degradation

Matt Campbell mkambol at gmail.com
Thu Mar 5 17:14:36 EST 2009


Thanks Julian,I just checked in 12408 with this change.  I couldn't see a
way to use validator to check type in this case, since result.get(0) will
return a Member[] or a Member.  If you can give me some hints about how I
might use validator I can give it another pass.

-matt


On Thu, Mar 5, 2009 at 2:20 PM, Julian Hyde <jhyde at pentaho.com> wrote:

>  As usual, my reaction to this code is 'yuck'.
>
> I think your change is sound, and you should go ahead.
>
> A few other improvements to consider:
>
> 1. To distinguish between a tuple-set and a member-set, see if you can use
> the type derived by the validator rather than the runtime 'instanceof'
> check.
>
> 2. The code seems to be detecting empty lists by doing a get(0) then
> handling the IndexOutOfBoundsException. I believe that the person who wrote
> the high-cardinalty dimension code did this because size() can be an
> expensive operation; but they may think that try-catch is more elegant than
> if, I don't know. It's not possible to make size() a cheap operation, but we
> can make isEmpty() efficient. Can you see whether it's possible to replace
> it with an isEmpty check?
>
> 3. Even though I don't like the high-cardinality dimension code as a whole,
> ConcatenableList is basically sound, and worth saving. The change you
> proposed earlier, to consolidate a the sublists of ConcatenableList into a
> single array on the first call to size or get, damages the performance
> characteristics of that class. For instance, it is not worth consolidating a
> list that has two sublists of size 500K, but it is worth consolidating a
> list that has 100K lists of size 10. In particular, your approach would give
> O(n^2) performance if someone adds one element at a time onto the end of a
> list.
>
> This looks like a better algorithm: consolidate only if the number of
> sublists is greater than the square root of the number of elements.
>
> Julian
>
>  ------------------------------
> *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
> *On Behalf Of *Matt Campbell
> *Sent:* Thursday, March 05, 2009 10:30 AM
> *To:* Mondrian developer mailing list
> *Subject:* Re: [Mondrian] Re: Performance degradation
>
>
> After spending some more time looking at this issue I think the best
> solution (at least short term), is to fix the method SetFunDef.evaluateList
> (pasted below).  I believe what it's intended to do is load an ArrayList if
> the dimension in the set is not flagged as highCardinality.  It fails in
> certain cases, though, since the result object may be a list of lists when
> there are multi-element tuples.
>
> I think a reasonable modification would be to assume the set is NOT
> highCardinality, unless there is only a single dimension in result. This
> would cause SetFunDef to use an ArrayList in such cases, which was the
> behavior prior to the high cardinality changes.
>
> Any objections to such a change?
>
> (note that the method below is as it is currently checked in)
>
>  public List evaluateList(final Evaluator evaluator) {
>             this.result = new ConcatenableList();
>             for (VoidCalc voidCalc : voidCalcs) {
>                 voidCalc.evaluateVoid(evaluator);
>             }
>             // REVIEW: What the heck is this code doing? Why is it OK to
>             // ignore IndexOutOfBoundsException and NullPointerException?
>             try {
>                 if (result.get(0) instanceof Member) {
>                     if (!((Member)result.get(0)).getDimension()
>                             .isHighCardinality()) {
>                         result.toArray();
>                     }
>                 }
>             } catch (IndexOutOfBoundsException iooe) {
>             } catch (NullPointerException npe) {
>             }
>             return result;
>         }
>
>
> On Wed, Mar 4, 2009 at 4:16 PM, Matt Campbell <mkambol at gmail.com> wrote:
>
>> Virtually all the time spent on the query is spent in ConcatenableList, in
>> the size() and get() methods.  Both have some pretty inefficient looking
>> logic.  In get(), if the element retrieved is not the next, previous, or
>> pre-previous, then the code iterates through the whole list up to the
>> elements index.  Similarly with .size(), the code iterates through the
>> lists, adding up each lists size.  I'm guessing that in many cases the
>> expense of doing this once is probably about the cost of converting to an
>> ArrayList.  As an experiment I modified .size() to always convert to an
>> ArrayList, and this cut the time to execute dramatically.
>>
>>
>> On Wed, Mar 4, 2009 at 12:02 PM, Matt Campbell <mkambol at gmail.com> wrote:
>>
>>> After syncing to a changelist I was still seeing lots of files in my
>>> directories that were showing version 0 in perforce.  I eventually got past
>>> this by starting from a totally clean slate and syncing.
>>> Still investigating, but it looks like changelist 11049 is the culprit.
>>>  The query I posted runs in 2.5 seconds with change 11048, and takes a full
>>> minute and a half with 11049.
>>>
>>>
>>> On Wed, Mar 4, 2009 at 11:16 AM, Julian Hyde <jhyde at pentaho.com> wrote:
>>>
>>>>  There is no version 0 as such. If p4 sync 'gets' version 0, it removes
>>>> the file from your sandbox. Is that what you are seeing?
>>>>
>>>>
>>>> On Mar 4, 2009, at 6:36 AM, Matt Campbell <mkambol at gmail.com> wrote:
>>>>
>>>>    Hmmm.  I see many files in my directories that are marked as having
>>>> version 0.  For example, I see MemberIterCalc.java in my sandbox, even
>>>> though it was added much after the changelist I synched to.  Any ideas how
>>>> this could happen?
>>>>
>>>> On Wed, Mar 4, 2009 at 9:13 AM, John V. Sichi < <jsichi at gmail.com>
>>>> jsichi at gmail.com> wrote:
>>>>
>>>>> Matt Campbell wrote:
>>>>>
>>>>>> I've been trying to isolate when this issue was introduced, but am
>>>>>> struggling a bit with perforce.  I'm trying to sync to a changelist using
>>>>>> something like:
>>>>>>
>>>>>> p4 sync //open/mondrian/... at 10480
>>>>>>
>>>>>> This brings in the set of files from that changelist, but does not
>>>>>> appear to delete files from future changelists.
>>>>>> What is the correct way to sync to a changelist?
>>>>>>
>>>>>
>>>>> That is the correct command.  It will delete files from future
>>>>> changelists, but it will not delete empty directories unless you have the
>>>>> rmdir option set in your client (the default is normdir).
>>>>>
>>>>> (Also, use p4 opened to make sure you don't have any files checked out
>>>>> before starting.)
>>>>>
>>>>> JVS
>>>>>
>>>>> _______________________________________________
>>>>> Mondrian mailing list
>>>>> <Mondrian at pentaho.org>Mondrian at pentaho.org
>>>>> <http://lists.pentaho.org/mailman/listinfo/mondrian>
>>>>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>>>>
>>>>
>>>>  _______________________________________________
>>>> Mondrian mailing list
>>>> Mondrian at pentaho.org
>>>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>>>
>>>>
>>>> _______________________________________________
>>>> Mondrian mailing list
>>>> Mondrian at pentaho.org
>>>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20090305/031530e4/attachment.html 


More information about the Mondrian mailing list