[Mondrian] Testing a Hive dialect

Calum Miller calum at millersoft.ltd.uk
Tue Feb 15 03:32:37 EST 2011


Hi Fu,John,

Many thanks for the feedback, much appreciated.

Regarding latency issues with star schemas in Hive, Mondrian can be configured to use just 1 big fat fact table requiring zero joins (the dimensions are derived using select distinct on the fact table columns). Such an approach may yield advantages on HBase once the GROUP BY optimisations have been added.

Regards

Calum
Sent from my iPad

On 15 Feb 2011, at 05:47, "fuhongwei141" <fuhongwei141 at gmail.com> wrote:

> Hi Calum, John,
> Thanks for the suggestions.
> A part of the latency problem is Mondrian do a lot of query on small tables that contains information for dimensions etc. In stead of integrating with HBase, I'm thinking maybe Hive can integrate with relational databases. Since hadoop already supports DBInputFormat and DBOutputFormat, and Hive is using relational database for metastore. This shouldn't be too challenging. I'm still doing some experients.
>  
> The HIVE-1922 patch 1 is just for people to run the foodmart test now. It's not finished yet. I'm still having problems with the having clauses. I will update it when finished.
>  
> Another plan I have with Hive is more support of ANSI SQL. Before that, I'm thinking of refatoring the SemanticAnalyzer. I know refactoring RowResolver with better OO is on the to do list. You know, all those  genXXXPlan methods. I think they are pretty good indicators to use a factory pattern.
> 2011-02-15
> fuhongwei141
> 发件人: John Sichi
> 发送时间: 2011-02-15  05:59:58
> 收件人: Mondrian developer mailing list
> 抄送:
> 主题: Re: [Mondrian] Testing a Hive dialect
> Speaking as the Hive committer who worked on that HBase integration, I can tell you that it will only increase the latency since going through HBase adds quite a bit of query execution overhead.  The intention of the integration is to decrease the latency on the write side (so that you can fire in row-level updates instead of only batch loads), but it comes at a price on the read side.  And all of the rest of the query execution overhead (map/reduce) remains the same.
> 
> On the bright side, we do have some people working on Hive indexing improvements, including GROUP BY rewrites and a bitmap index implementation.
> 
> Regarding HIVE-1922, I reviewed it and it needs some more work.
> 
> JVS
> 
> 2011/2/14 Calum Miller <calum at millersoft.ltd.uk>
> Hi Fu,
> 
> I discovered this link http://wiki.apache.org/hadoop/Hive/HBaseIntegration on Hive and HBase integration and wondered if this is something you had reviewed? I'm thinking this integration may reduce the latency issues with Hive and improve Mondrian responsiveness.
> 
> Calum
> On 14 Feb 2011, at 15:23, Fu Hongwei wrote:
> 
>> Hi,
>> Thanks.
>> There are still some failures. I will submit other changes when the test suite pass. Also, the HIVE patch seemed to be broken.  I will try to fix that too.
>>  
>>  
>> 2011-02-14
>> Fu Hongwei
>> 发件人: Julian Hyde
>> 发送时间: 2011-02-14  10:45:26
>> 收件人: 'Fu Hongwei'
>> 抄送: 'Mondrian developer mailing list'
>> 主题: RE: RE: RE: [Mondrian] Testing a Hive dialect
>> I have checked in your patch as change 14118. Thank you for the contribution.
>>  
>> I did some cleanup first (mainly to make the code comply with our coding conventions). Let me know if I broke anything.
>>  
>> I'm doing some further cleanup to move the Hive-specific stuff (e.g. checking that the 'on' clause only contains 'x.a = y.b' or 'upper(x.a) = upper(y.b)') into the Hive dialect. My goal is to enable FROM-JOIN-ON syntax for other dialects such as Oracle.
>>  
>> I do not have a Hive instance to test against, so my apologies in advance if I break anything in this upcoming change. I hope that change 14118 will give others such as Calum enough to test and work against.
>>  
>> Generally I like to have a few more contributions before I give newcomers committer access. For now, the best way to send patches is using the packChange utility (available in //open/util/bin in the eigenbase perforce repository). But I will also accept tar files or patch files as long as each file has a '$Id: $' header somewhere so that I can identify the version that you modified. Next time can you also run the //open/mondrian/bin/checkFile.sh script on your changes, so check for compliance with our coding guidelines.
>>  
>> Julian
>> 
>> From: Fu Hongwei [mailto:fuhongwei2006 at 163.com] 
>> Sent: Sunday, February 13, 2011 7:15 AM
>> To: jhyde
>> Cc: 'Mondrian developer mailing list'
>> Subject: Re: RE: RE: [Mondrian] Testing a Hive dialect
>> 
>> Thanks, please commit the files for me. Hive-1922 contains some stuff I consider easier to fix on the Hive side, but, yes, I think it's better if we have a non hive patch version.
>>  
>> The failures are caused by some small problems. It's running beautifully now.
>>  
>> I've fixed the join issue. You would also want to add a VM parameter when running the tests or it will report some errors.
>> -javax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
>>  
>> Can I have some kind of commit right? It looks like I have a lot of files to submit.
>>  
>> BTW, could somebody tell me how to make a patch in perforce like in svn? Thanks.
>> 2011-02-13
>> Fu Hongwei
>> 发件人: Julian Hyde
>> 发送时间: 2011-02-12  01:59:33
>> 收件人: 'Mondrian developer mailing list'
>> 抄送:
>> 主题: RE: RE: [Mondrian] Testing a Hive dialect
>> If it is difficult to fix HIVE-1922, we could possibly workaround issue by changing Dialect.generateOrderItem. (Every dialect tends to have different rules for how to generate an ORDER BY clause -- order by column name, order by ordinal, order by expression, order by ordinal only when applied to a set operation such as union, etc. -- so there's no harm having yet another behavior.)
>>  
>> Someone also mentioned that Hive only supports ANSI join syntax 'FROM t1 JOIN t2 ON t1.x = t2.y', whereas mondrian only generates 'FROM t1, t2 WHERE t1.x = t2.y'. Is this still an issue? We would need to fix either Hive or Mondrian's dialect. Mondrian's dialect it probably easier.
>>  
>> Since other people have expressed an interest in a Hive dialect, it would be useful if you check in what you have right now, even though there are many failures. Send me the files and I will check them in.
>>  
>> Julian
>>  
>> 
>> From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Fu Hongwei
>> Sent: Friday, February 11, 2011 3:53 AM
>> To: 'Mondrian developer mailing list'
>> Subject: Re: RE: [Mondrian] Testing a Hive dialect
>> 
>> Hi Julian,
>> Thanks for the prompt reply.
>> The test is running on Hive 0.7.0, but a patch will be needed. I've started a jira on Hive and will submit it soon too.
>> https://issues.apache.org/jira/browse/HIVE-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987548#action_12987548
>>  
>> There are too many failures in the test suite.  I'm still working on it, but I will submit the DialectTest.java and HiveDialect.java, so people interested can work on it together.
>> I think the point is first to make it running then speed it up.
>> 2011-02-11
>> Fu Hongwei
>> 发件人: Julian Hyde
>> 发送时间: 2011-02-11  00:19:26
>> 收件人: 'Mondrian developer mailing list'
>> 抄送:
>> 主题: RE: [Mondrian] Testing a Hive dialect
>> I am well aware of the compromises with Hive. There is certainly an impedance mismatch between hadoop and the real-time analysis, and that is reflected in query response time. It is still useful  to have a Hive dialect, because as you say, Hive is improving all the time. And Pentaho is thinking about ways to bridge the impedance mismatch.
>>  
>> Can please you attach your code to the jira case as a patch? I will submit it.
>>  
>> Also please attach the output of the test suite, and describe the version of Hive you are running against. That will be a reference point for others who are working on Hive.
>>  
>> Julian
>> 
>> From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Fu Hongwei
>> Sent: Thursday, February 10, 2011 1:29 AM
>> To: mondrian
>> Subject: [Mondrian] Testing a Hive dialect 
>> 
>> Hi,
>> I'm new to the community. I don't know who to ask this for.
>> I've written a Hive dialect for mondrian and been testing it.
>> The DialectTest is passed, but it's still failing other tests.
>>  
>> From what I can see it's still a long way to make an integration
>> of Hive and Mondrian practical.
>> 1. It's really slow. It takes 1 or 2 days to run through all the tests,
>> a low end estimation. The latency is too big for most applications.
>> It might take some major architectural change on the Hive side to solve
>> this problem.
>> 2. Hive ql is still at a rather immature stage, partly because it's not
>> really intended to be a full featured relational database. There are some
>>  bugs too, like the join behavior is incorrect at the current trunk.
>>  
>> I know there is already a jira about it.
>> http://jira.pentaho.com/browse/MONDRIAN-789 
>> How can I submit the code? Thanks
>> 2011-02-10
>> Fu Hongwei
>> _______________________________________________
>> 
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
> 
> Calum Miller
> Business Intelligence Specialist
> calum at millersoft.ltd.uk
> http://www.millersoft.ltd.uk
> Address: Stuart House, Eskmills, Station Road, Musselburgh, East Lothian, Scotland, EH21 7PB 
> Location: http://www.eskmills.com
> Telephone: +44 (0) 131 208 0202
> Fax: +44 (0) 131 608 0037
> Mobile: +44 (0) 7973 249 621
> Skype: calummiller
> 
> <Millersoft_Pentaho_Partner(02-15-12-26-01).png>
> 
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
> 
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20110215/adccb2d6/attachment.html 


More information about the Mondrian mailing list