[Mondrian] Efficient JSON representation for "XMLA" responses

Roland Bouman roland.bouman at gmail.com
Wed Sep 19 20:11:32 EDT 2012


Hi All,

I've been lurking but watching this discussion.
I just want to add an observation: there's two different problems that
are being mentioned:
1) XML/A requires multiple (too many) requests to obtain metadata
2) Large cellsets suffer from a too bulky XML representation.
These are separate problems, with separate solutions.

I concur with Julian that deep=true should solve most of 1) - the fact
that that JSON is very much like the data you get with XML/A should
IMO not be that much of a problem - at least not as far as size is
concerned. Regarding form/structure: personally I don't think the
server can ever provide "the" right representation - depending on the
application, the client should always be prepared to process that data
to re-represent it as they see fit.

I also concur with Julian that gzip compression should help at least a
factor 10, but likely 20 to alleviate problem 2. I don't think it will
ever come close to the 1% range (or even less) suggested by Michele.
That said, I would be interested in some elaboration on this matter.
Full XML/A cellsets convey not just the values but also cell
properties - is that included in that sub-1% reported by Michele?
If so, I'd be very interested in seeing some sample of the data
representation Michele's company is using (or would like to be using).

Kind regards,


Roland

On Thu, Sep 20, 2012 at 1:46 AM, Julian Hyde <jhyde at pentaho.com> wrote:
> I'd try enabling gzip on the HTTP requests. Should help a lot. (Never done
> it myself, but let me know how it goes.)
>
> Doesn't sound that easy to proxy one olap4j driver's metadata objects over
> the wire. They're not (designed to be) serializable.
>
> Json result format with deep=true is worth a try. You can get the whole
> schema in one json response. Combined with gzip, should be fairly compact.
>
> If any of these approaches work out and there is a consensus (e.g. between
> Michele and Paul) that the olap4j XMLA driver should use optimized protocols
> (still based on the XMLA specification in a clear way) then I think we
> should move ahead and put them into the driver.
>
> By the way, now the XMLA server is a separate project, it should be easier
> to experiment.
>
> Julian
>
>
> On Sep 19, 2012, at 1:36 AM, Michele Rossi <michele.rossi at gmail.com> wrote:
>
> hi,
> we are also concerned about the poor performances of XMLA, as you say below
> it requires too many queries to obtain metadata and cell data and the
> representation itself is extremely inefficient.
>
> We have a real customer use case of a 20000 rows pivots which is rendered in
> Excel via the SimbaO2X plugin.
> Such pivot results in about 268Mb of XML exchanged between client and
> server.
> What's interesting is that the informative content of such xml is very
> limited and it can be compressed to 1.69Mb (0.6% of the original).
>
> Right at this moment I don't have the time to look at how you're using JSON
> in the new olap4j-xmlaserver but judging from the example provided below it
> looks like you're transferring xmla-like content, only serialised in JSON.
> I am thinking that something like that would only be useful if you control
> client and server (unit tests?) or if you manage to create a new standard.
> Or perhaps to create a new olap4j transport driver which could use whatever
> format we want to simply delegate calls over to a remote olap4j driver (we
> have done that here).
> Hope I am not missing something obvious here.
>
> What we've done in my company while building our own olap4j driver was to
> transfer metadata using a custom XSD schema which mirrors closely the Olap4j
> entities.
> This way we can obtain all the data required to deep-populate a
> org.olap4j.metadata.Schema in one call.
> We have used XSD+JAXB with XML serialization for simplicity but we are
> planning to stop using XML all together and use JSON to represent data
> instead.
>
> While our olap4j drivers implements all ResultSet based metadata calls such
> org.olap4j.OlapDatabaseMetaData.getCubes(String, String, String) we are not
> actively using them preferring instead deep metadata objects such as
> org.olap4j.metadata.Schema obtained as described above.
>
> And that was my two cents, hope it helps.
>
> thanks,
> Michele
>
>
> On 19 September 2012 00:46, Julian Hyde <jhyde at pentaho.com> wrote:
>>
>> You're right about round trips. I believe I've already addressed that,
>> with the <Deep>true</Deep> option. Search for "testMDCubesDeepJson" in that
>> file and you'll see what I mean.
>>
>> We can rip out some of the header elements that are all about SOAP and XML
>> namespaces. "row" can be replaced with something else ("CUBES" seems more
>> appropriate than "cube", if you look at the DeepJson example). The format
>> needs also needs to be able to return an exception, with standard XMLA
>> response codes.
>>
>> That said, people will put up with some crap in a protocol if it gives
>> them what they need (and reasonably efficiently). So I wouldn't bend over
>> backwards to make it absolutely the sexiest json ever.
>>
>> Take a look at the examples in test file and propose something concrete.
>>
>> Julian
>>
>>
>> On Sep 18, 2012, at 4:29 PM, Paul Stoellberger <p.stoellberger at gmail.com>
>> wrote:
>>
>> One of my biggest concerns over xmla-like communication is the overall
>> overhead in a) number http calls b) processing the results
>>
>> With this row (array) based results we will have to build in logic into
>> the client to be able to understand the logic behind catalogs, schemas,
>> cubes and how they correlate and turn those rows into objects instead of
>> returning objects directly, as we do in saiku today. In order to work with
>> this data I need those objects, and this just doesn't feel right to me.
>>
>> Furthermore this will have to be processed by the client. Now I know this
>> is probably not something very expensive to do but I would rather keep the
>> necessary logic and processing on the client side to an absolute minimum.
>> Doing the same transformations on server side code will always be faster and
>> therefore deliver a better user experience to the user. The less the browser
>> has to work, the better.
>>
>> But my major point of concern is the number of server calls needed to
>> retrieve the necessary metadata information.
>> I admit I don't know the details about all xmla calls but what I can tell
>> by looking at olap4j's xmla driver there are a lot of round trips needed to
>> retrieve all the information necessary. I have plans to understand those
>> calls better in order to improve the xmla driver. Maybe I'm wrong, so feel
>> free to convince me otherwise.
>>
>> Regarding standards, are there any other OLAP servers that return metadata
>> in a xmla like manner?
>> What was good for xml, doesn't have to be necessarily the best for todays
>> "ways of doing things".
>> People were able to consume xmla (like Andy / Roland proved successfully)
>> directly before, however we got great feedback from web developers who
>> confirmed how easy it is to work with our (metadata) API.
>> When I look at public API's like github's I see rich JSON objects and not
>> row based resultsets.
>> What are your arguments to keeping the structure completely the same but
>> just switching from xml to json?
>>
>> Metadata calls are pretty straightforward and there is not much you can do
>> differently, so I am happy to adapt to a better standard (although having 1
>> server return the data this way doesn't make it a standard).
>>
>> However, the best API and standards don't make good software. My biggest
>> concern is to make things work.
>> If it means I can do stuff more efficient (less http calls, less
>> processing needed) I prefer that over a "clean" API.
>>
>> But maybe in order to move forward:
>> Maybe there is a way how we can turn those xmla like results into
>> something nice thats still mappable to xmla results.
>> E.g. the JSON you pasted below:
>> If "row" becomes "cube" and this cube element (can) contain already
>> hierarchy and/or level information in the same result, but still keep the
>> xmla like result style that would make me already happy.
>>
>> Andy, Roland ... you two have probably the best insights on this topic so
>> it would be great if you could share your opinion with us as well!
>>
>> -Paul
>>
>>
>>
>> On Sep 19, 2012, at 12:44 AM, Julian Hyde wrote:
>>
>> Paul,
>>
>> On IRC yesterday you said there was a more efficient JSON representation
>> of XMLA responses than the one implemented in olap4j-xmlaserver (formerly
>> Mondrian's XMLA server).
>>
>> There are some examples in XmlaBasicTest [ see
>> https://raw.github.com/pentaho/mondrian/master/testsrc/main/mondrian/xmla/XmlaBasicTest.ref.xml
>> and search for "Json" ]. One is
>>
>> "cxmla:DiscoverResponse": {
>>   "xmlns:cxmla": "urn:schemas-microsoft-com:xml-analysis",
>>   "cxmla:return": {
>>     "root": {
>>       "xmlns": "urn:schemas-microsoft-com:xml-analysis:rowset",
>>       "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
>>       "xmlns:xsd": "http://www.w3.org/2001/XMLSchema",
>>       "xmlns:EX": "urn:schemas-microsoft-com:xml-analysis:exception",
>>       "row": [
>>         {
>>           "CATALOG_NAME": "FoodMart",
>>           "SCHEMA_NAME": "FoodMart",
>>           "CUBE_NAME": "HR",
>>           "CUBE_TYPE": "CUBE",
>>           "LAST_SCHEMA_UPDATE": "xxxx-xx-xxTxx:xx:xx",
>>           "IS_DRILLTHROUGH_ENABLED": true,
>>           "IS_WRITE_ENABLED": false,
>>           "IS_LINKABLE": false,
>>           "IS_SQL_ENABLED": false,
>>           "CUBE_CAPTION": "HR",
>>           "DESCRIPTION": "FoodMart Schema - HR Cube"
>>         },
>>         {
>>           "CATALOG_NAME": "FoodMart",
>>           "SCHEMA_NAME": "FoodMart",
>>           "CUBE_NAME": "Sales",
>>           "CUBE_TYPE": "CUBE",
>>           "LAST_SCHEMA_UPDATE": "xxxx-xx-xxTxx:xx:xx",
>>           "IS_DRILLTHROUGH_ENABLED": true,
>>           "IS_WRITE_ENABLED": false,
>>           "IS_LINKABLE": false,
>>           "IS_SQL_ENABLED": false,
>>           "CUBE_CAPTION": "Sales",
>>           "DESCRIPTION": "FoodMart Schema - Sales Cube"
>>         },
>>
>>                 ...
>>
>> and there's another that returns nested metadata (cubes, dimensions,
>> hierarchies, etc.) and another that returns a cell set.
>>
>> I'm open to making this more concise and "json-like" as long as there is a
>> clear mapping from the XMLA spec. (If we depart from the spec entirely we're
>> back in the wild west, where clients can only rely on their own server.) If
>> you have some ideas please propose them.
>>
>> Julian
>>
>>
>>
>> Julian Hyde
>> jhyde at pentaho.com
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>



-- 
Roland Bouman
blog: http://rpbouman.blogspot.com/
twitter: @rolandbouman
linkedin: http://www.linkedin.com/profile/view?id=5142800&trk=tab_pro

Author of "Pentaho Solutions" (Wiley, ISBN: 978-0-470-48432-6
http://tinyurl.com/lvxa88) and "Pentaho Kettle Solutions" (Wiley,
ISBN: 978-0-470-63517-9 http://tinyurl.com/33r7a8m)


More information about the Mondrian mailing list