[Mondrian] Cardinality Estimation
jhyde at pentaho.com
Wed Sep 26 00:03:28 EDT 2012
On Sep 25, 2012, at 9:58 PM, Roland Bouman <rbouman at pentaho.com> wrote:
> I just found this article:
> seemed like it might interest you w/re mondrian statistics gathering. I hope it's useful.
I love the HyperLogLog algorithm, and have been trying to find a way to get it into Mondrian. Not only can it estimate stats based on a small sample, you can save the "sketches" it produces and allows you to roll-up approximate distinct counts. (That would be a separate feature request: for an "approximate-distinct-count" aggregate function.)
(a) Mondrian can't afford to be counting cardinalities, even via sampling, at run time. We should have a utility that counts all attributes and writes them into your schema file.
(b) Ideally, the RDBMS should be doing this, not Mondrian. I guess Mondrian could do HyperLogLog as a fall-back, and for any database that had fast cardinality estimation we could override a method in JdbcStatisticsProvider.
More information about the Mondrian