[Mondrian] Matches operator

Julian Hyde jhyde at pentaho.com
Tue May 24 15:46:05 EDT 2011

(Taking a private developers discussion online. Hopefully interested readers
can infer the context from the discussion and
Here are the design principles that relate to the Matches operator and how
we speed up Analyzer's filter functionality. They boil down to (a)
consistency and correctness across all database platforms, (b) performance
whenever possible, (c) testing to ensure a and b.
1. We should specify what Matches does. In particular, what is the precise
regular expression language? (POSIX, Java or something else.) Document the
behavior in mdx.html, since it is an extension function.
2. Matches should behave the same on all databases. We should write unit
tests in FunctionTest to validate that.
3. When possible, we should push the Matches expression down to the
database. That won't always be possible, especially if the database has a
different regexp syntax. In that case, mondrian should evaluate Matches
natively (i.e. not in SQL).
4. Dialects should make every effort to recognize and push down the regexps
that Analyzer generates. I can't say what that regexp would be, because we
haven't decided #1. Supposing we decided that it was going to be a java
regexp; then Analyzer would generate "(?i).*foo.*" if the user searches for
"foo". The Oracle dialect could translate this to " upper(column) like
'%foo%' ".
5. Have a dialect test that ensures that if the dialect claims that it can
translate a given regexp to a SQL expression then that SQL expression gives
the right results. (A dialect that always returns 'null' -- meaning that it
can't translate the regexp to SQL -- will trivially pass this test.)
6. "expr Matches non-constant-expression" should work, but should not
necessarily be fast.
7. "not (expr1 matches expr2)" should be pushed down if possible. I don't
see any need to implement "expr not matches expr2" syntax.
I know that you are scrambling to produce an initial version of the fix, and
it won't meet the criteria above. That's fine. But I feel that we should
circle back and implement the above principles.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20110524/45e4d517/attachment.html 

More information about the Mondrian mailing list