[Mondrian] Mondrian + Hadoop Hive Research

Julian Hyde jhyde at pentaho.com
Mon Aug 23 12:18:25 EDT 2010


Ah... you logged it as a PDI bug. That's why I didn't see it!
 
A few comments:
 
For the record, PDI etc. may need prepared statements, but mondrian doesn't.
(People complain about that.)
 
The problem of Hive requiring SQL-92 JOIN syntax and mondrian not generating
it is real. Mondrian generates 'FROM a, b' because it is the lowest common
denominator -- every database (except Hive, apparently) supports it -- and
because it never needs outer join. It shouldn't be that hard to cause
mondrian to generate 'FROM a JOIN b ON c' rather than 'FROM a, b WHERE c'.
Search for calls to SqlQuery.addFrom() followed by SqlQuery.addWhere(),
maybe convert to SqlQuery.addJoin().
 
Most of the changes in this patch are hackery, to scope the problem as
opposed to fix it, and should not be checked in. The changes to
JdbcDialectImpl are useful, but should made in a new HiveDialect class, not
in the base class.
 
Julian
 
 


  _____  

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of Jordan Ganoff
Sent: Monday, August 23, 2010 8:30 AM
To: calum at millersoft.ltd.uk
Cc: mondrian at pentaho.org
Subject: [Mondrian] Mondrian + Hadoop Hive Research


Calum,

The PDI team recently did some research for using Apache's Hive database
with Mondrian.  You tweeted "Thinking of adding Hive dialect to Mondrian,
too slow for add-hoc queries but would enable mdx reports against hadoop" on
11:21 AM Aug 22nd. This is an effort to give you a bit of heads up to some
of the issues you'll encounter and the JIRA case with patches to Mondrian to
get Hive working at a remedial level.  Research done by James Dixon, Sean
Flatley and myself found the following issues when integrating Hive with
Mondrian 3.1.6.13364.  We used our custom built Hive JDBC Driver[1] for
testing:

- Prepared Statements are mostly unimplemented in the JDBC driver from
Apache in versions 0.5.0 and trunk (0.7.0).  We are working on implementing
more functionality as needed to get Report Designer and Metadata Editor
working well.  Our changes are being contributed back to the official Hive
project but until they are accepted we're building our own version[1][2].

- Join syntax generated by Mondrian is not compatible with Hive.  In short,
Hive does not support multiple items in the from clause.[2]

See the related JIRA case with attached patches to get Mondrian 3.1.6.13364
to work with Hive and associated test results:
http://jira.pentaho.com/browse/PDI-4355

[1]: http://ci.pentaho.com/view/Data%20Integration/job/apache-hive-0.5.0/
[2]: http://forums.pentaho.com/showthread.php?77826-Hive-amp-Hadoop
[3]: http://wiki.apache.org/hadoop/Hive/LanguageManual/Joins

Hope this helps,


Jordan Ganoff
Software Engineer

Pentaho: The Commercial Open Source Alternative for Business Intelligence
5950 Hazeltine National Drive, Suite 340 . Orlando, FL 32822, USA
+1 407 812-OPEN (6736) . 407 517-6206 . 321 848-8207
Get your free download today at http://www.pentaho.com
<http://www.pentaho.com/download> . 	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20100823/73388ebd/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1368 bytes
Desc: not available
Url : http://lists.pentaho.org/pipermail/mondrian/attachments/20100823/73388ebd/attachment.gif 


More information about the Mondrian mailing list