[Mondrian] Testing a Hive dialect

John Sichi jsichi at gmail.com
Mon Feb 14 16:59:12 EST 2011


Speaking as the Hive committer who worked on that HBase integration, I can
tell you that it will only increase the latency since going through HBase
adds quite a bit of query execution overhead.  The intention of the
integration is to decrease the latency on the write side (so that you can
fire in row-level updates instead of only batch loads), but it comes at a
price on the read side.  And all of the rest of the query execution overhead
(map/reduce) remains the same.

On the bright side, we do have some people working on Hive indexing
improvements, including GROUP BY rewrites and a bitmap index implementation.

Regarding HIVE-1922, I reviewed it and it needs some more work.

JVS

2011/2/14 Calum Miller <calum at millersoft.ltd.uk>

> Hi Fu,
>
> I discovered this link http://wiki.apache.org/hadoop/Hive/HBaseIntegrationon Hive and HBase integration and wondered if this is something you had
> reviewed? I'm thinking this integration may reduce the latency issues with
> Hive and improve Mondrian responsiveness.
>
> Calum
> On 14 Feb 2011, at 15:23, Fu Hongwei wrote:
>
> Hi,
> Thanks.
> There are still some failures. I will submit other changes when the test
> suite pass. Also, the HIVE patch seemed to be broken.  I will try to fix
> that too.
>
>
> 2011-02-14
> ------------------------------
> Fu Hongwei
> ------------------------------
> *发件人:* Julian Hyde
> *发送时间:* 2011-02-14  10:45:26
> *收件人:* 'Fu Hongwei'
> *抄送:* 'Mondrian developer mailing list'
> *主题:* RE: RE: RE: [Mondrian] Testing a Hive dialect
> I have checked in your patch as change 14118. Thank you for the
> contribution.
>
> I did some cleanup first (mainly to make the code comply with our coding
> conventions). Let me know if I broke anything.
>
> I'm doing some further cleanup to move the Hive-specific stuff (e.g.
> checking that the 'on' clause only contains 'x.a = y.b' or 'upper(x.a) =
> upper(y.b)') into the Hive dialect. My goal is to enable FROM-JOIN-ON
> syntax for other dialects such as Oracle.
>
> I do not have a Hive instance to test against, so my apologies in advance
> if I break anything in this upcoming change. I hope that change 14118 will
> give others such as Calum enough to test and work against.
>
> Generally I like to have a few more contributions before I give newcomers
> committer access. For now, the best way to send patches is using the
> packChange utility (available in //open/util/bin in the eigenbase perforce
> repository). But I will also accept tar files or patch files as long as each
> file has a '$Id: $' header somewhere so that I can identify the version that
> you modified. Next time can you also run the
> //open/mondrian/bin/checkFile.sh script on your changes, so check for
> compliance with our coding guidelines.
>
> Julian
>
> ------------------------------
> *From:* Fu Hongwei [mailto:fuhongwei2006 at 163.com]
> *Sent:* Sunday, February 13, 2011 7:15 AM
> *To:* jhyde
> *Cc:* 'Mondrian developer mailing list'
> *Subject:* Re: RE: RE: [Mondrian] Testing a Hive dialect
>
> Thanks, please commit the files for me. Hive-1922 contains some stuff I
> consider easier to fix on the Hive side, but, yes, I think it's better if we
> have a non hive patch version.
>
> The failures are caused by some small problems. It's running beautifully
> now.
>
> I've fixed the join issue. You would also want to add a VM parameter when
> running the tests or it will report some errors.
>
> -javax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
>
> Can I have some kind of commit right? It looks like I have a lot of files
> to submit.
>
> BTW, could somebody tell me how to make a patch in perforce like in svn?
> Thanks.
> 2011-02-13
> ------------------------------
> Fu Hongwei
> ------------------------------
> *发件人:* Julian Hyde
> *发送时间:* 2011-02-12  01:59:33
> *收件人:* 'Mondrian developer mailing list'
> *抄送:*
> *主题:* RE: RE: [Mondrian] Testing a Hive dialect
> If it is difficult to fix HIVE-1922, we could possibly workaround issue by
> changing Dialect.generateOrderItem. (Every dialect tends to have different
> rules for how to generate an ORDER BY clause -- order by column name, order
> by ordinal, order by expression, order by ordinal only when applied to a set
> operation such as union, etc. -- so there's no harm having yet another
> behavior.)
>
> Someone also mentioned that Hive only supports ANSI join syntax 'FROM t1
> JOIN t2 ON t1.x = t2.y', whereas mondrian only generates 'FROM t1, t2 WHERE
> t1.x = t2.y'. Is this still an issue? We would need to fix either Hive or
> Mondrian's dialect. Mondrian's dialect it probably easier.
>
> Since other people have expressed an interest in a Hive dialect, it would
> be useful if you check in what you have right now, even though there are
> many failures. Send me the files and I will check them in.
>
> Julian
>
>
> ------------------------------
> *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
>  *On Behalf Of *Fu Hongwei
> *Sent:* Friday, February 11, 2011 3:53 AM
> *To:* 'Mondrian developer mailing list'
> *Subject:* Re: RE: [Mondrian] Testing a Hive dialect
>
> Hi Julian,
> Thanks for the prompt reply.
> The test is running on Hive 0.7.0, but a patch will be needed. I've started
> a jira on Hive and will submit it soon too.
>
> https://issues.apache.org/jira/browse/HIVE-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987548#action_12987548
>
> There are too many failures in the test suite.  I'm still working on it,
> but I will submit the DialectTest.java and HiveDialect.java, so people
> interested can work on it together.
> I think the point is first to make it running then speed it up.
> 2011-02-11
> ------------------------------
> Fu Hongwei
> ------------------------------
> *发件人:* Julian Hyde
> *发送时间:* 2011-02-11  00:19:26
> *收件人:* 'Mondrian developer mailing list'
> *抄送:*
> *主题:* RE: [Mondrian] Testing a Hive dialect
> I am well aware of the compromises with Hive. There is certainly an
> impedance mismatch between hadoop and the real-time analysis, and that is
> reflected in query response time. It is still useful  to have a Hive
> dialect, because as you say, Hive is improving all the time. And Pentaho is
> thinking about ways to bridge the impedance mismatch.
>
> Can please you attach your code to the jira case as a patch? I will submit
> it.
>
> Also please attach the output of the test suite, and describe the version
> of Hive you are running against. That will be a reference point for others
> who are working on Hive.
>
> Julian
>
> ------------------------------
> *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
>  *On Behalf Of *Fu Hongwei
> *Sent:* Thursday, February 10, 2011 1:29 AM
> *To:* mondrian
> *Subject:* [Mondrian] Testing a Hive dialect
>
> Hi,
> I'm new to the community. I don't know who to ask this for.
> I've written a Hive dialect for mondrian and been testing it.
> The DialectTest is passed, but it's still failing other tests.
>
> From what I can see it's still a long way to make an integration
> of Hive and Mondrian practical.
> 1. It's really slow. It takes 1 or 2 days to run through all the tests,
> a low end estimation. The latency is too big for most applications.
> It might take some major architectural change on the Hive side to solve
> this problem.
> 2. Hive ql is still at a rather immature stage, partly because it's not
> really intended to be a full featured relational database. There are some
>  bugs too, like the join behavior is incorrect at the current trunk.
>
> I know there is already a jira about it.
> http://jira.pentaho.com/browse/MONDRIAN-789
> How can I submit the code? Thanks
> 2011-02-10
> ------------------------------
> Fu Hongwei
>
> _______________________________________________
>
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
> Calum Miller
> Business Intelligence Specialist
> calum at millersoft.ltd.uk
> http://www.millersoft.ltd.uk
> Address: Stuart House, Eskmills, Station Road, Musselburgh, East Lothian,
> Scotland, EH21 7PB
> Location: http://www.eskmills.com
> Telephone: +44 (0) 131 208 0202
> Fax: +44 (0) 131 608 0037
> Mobile: +44 (0) 7973 249 621
> Skype: calummiller
>
>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20110214/cbc7fe8e/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 11113 bytes
Desc: not available
Url : http://lists.pentaho.org/pipermail/mondrian/attachments/20110214/cbc7fe8e/attachment.png 


More information about the Mondrian mailing list