[Mondrian] Cardinality Estimation

Julian Hyde jhyde at pentaho.com
Wed Sep 26 00:03:28 EDT 2012


On Sep 25, 2012, at 9:58 PM, Roland Bouman <rbouman at pentaho.com> wrote:

> I just found this article:
> 
> http://blog.notdot.net/2012/09/Dam-Cool-Algorithms-Cardinality-Estimation
> 
> seemed like it might interest you w/re mondrian statistics gathering. I hope it's useful.


I love the HyperLogLog algorithm, and have been trying to find a way to get it into Mondrian. Not only can it estimate stats based on a small sample, you can save the "sketches" it produces and allows you to roll-up approximate distinct counts. (That would be a separate feature request: for an "approximate-distinct-count" aggregate function.)

However:

(a) Mondrian can't afford to be counting cardinalities, even via sampling, at run time. We should have a utility that counts all attributes and writes them into your schema file.

(b) Ideally, the RDBMS should be doing this, not Mondrian. I guess Mondrian could do HyperLogLog as a fall-back, and for any database that had fast cardinality estimation we could override a method in JdbcStatisticsProvider.

Julian


More information about the Mondrian mailing list