[Mondrian] CrossJoinArg.isPreferInterpreter

Julian Hyde julianhyde at speakeasy.net
Sun May 13 18:45:34 EDT 2007


> JVS wrote:
>
> In RolapNativeSet, there's currently a check for the case 
> where all of 
> the inputs to a nonempty crossjoin are explicitly enumerated sets:
> 
>       * If all involved sets are already known, like in 
> crossjoin({a,b}, 
> {c,d}),
>       * then use the interpreter.
> 
> This definitely doesn't make sense in the case where the 
> enumerated sets 
> are large, since the result set of nonempty tuples may be 
> much smaller 
> than the full product.  In the case where the sets are small and 
> measures are already cached, then it makes sense to avoid the 
> SQL.  But 
> I'd rather avoid adding yet another property.  Would it be OK to just 
> get rid of this check?  There's not currently enough 
> framework in place 
> to do a cost/benefit analysis; I think it would be better to err in 
> favor of large sets.

I have no objections to that.

As you point out, it's difficult to know which implementation is better
without a framework for cost/benefit analysis. This has been on my mind
for some time, because we sometimes seem to be making two steps forward,
one step back as regards performance, so I'd like to start a discussion
about how to do that.

Performance regression testing (PRT) is testing that looks for
degradations in performance in particular performance-critical
components.

PRT is difficult because the running-time of a test will vary
significantly depending on (a) the capabilities of the machine running
the test (cpu, disk and memory speed); (b) other things happening on the
same machine; (c) the state of the cache after other tests; (d)
uncontrollable variables such as other processes running on the same
machine, the amount of available memory, and cosmic rays.

Even though it is difficult, PRT is worth doing:
 * PRT makes it more difficult to accidentally break the architecture.
 * PRT forces developers to provide numbers to justify changes in the
architecture. A change which makes a major component 30% more complex
yet improves performance by only 2% is probably not worth making.
 * PRT allows us to recognize when use cases are so diverse that there
is no "one size fits all" algorithm, and we need different algorithms
for different use cases. (As we Brits say, "horses for courses".)
 * PRT makes it easier to debug performance problems, because we find
out about them at the time the code is changed, rather than after the
code is released.
 * PRT encourages interested parties to contribute tests, so that the
features of the product that they care about are not inadvertantly
broken.

(Note to self: This last point expresses a "defensive testing"
philosophy. It pervades the open-source development process, but I've
never seen it discussed. I might write a blog entry about it sometime.)

If we could find a testing methodology which was isolated from all of
those effects, then we could guarantee no false-negatives in the
performance regression tests, and require developers to run them along
with the rest of the test suite before checking in.

One approach which might work here is to count the number of
evaluator-calls required to execute a given MDX statement. (It's similar
in principle to the instruction-counting tests which the performance
team used to run when I was an kernel developer at Oracle.)

Anyone have any other ideas for how we would introduce simple, effective
performance-regression testing into our development process?

Julian




More information about the Mondrian mailing list