[Mondrian] Efficient JSON representation for "XMLA" responses

Thu Sep 20 04:20:10 EDT 2012

hi Julian,
the question is, what out there can consume json/xmla messages?
We use SimbaO2X and I am pretty sure that it can't handle it.

Are we talking about cooking up a different standard?

Regarding the serialization of the Olap4j metadata objects: the way we have
done it is to create a very similar set of classes using an .xsd schema
which we then run through the JAXB binding compiler to create annotated /
xml serializable classes that look like the Olap4j ones but are not the
same thing.

We had to code up a serialization / deserialization thing to go from the
jaxb/xml classes to the olap4j classes and it was very painful because of
the complexity of the relationships between the olap4j classes.
One specific and little known feature of xml / jaxb that was critical in
succeeding was the IDREF. In a nutsheel pointers to other objects within
the same xml document.
As far as I know that's not available out of the box in json and this is
why currently we are stuck with xml.

Regarding the other question about the compressibility of xmla: I can
provide an example zipped using 7-zip, it's 1.7Mb and it expands to 275Mb.
It's an "Execute Response" message.
If it's ok to post 1.7Mb attachments to the list let me know and I will
send it out.

thanks,
Michele

On 20 September 2012 00:46, Julian Hyde <jhyde at pentaho.com> wrote:

> I'd try enabling gzip on the HTTP requests. Should help a lot. (Never done
> it myself, but let me know how it goes.)
>
> Doesn't sound that easy to proxy one olap4j driver's metadata objects over
> the wire. They're not (designed to be) serializable.
>
> Json result format with deep=true is worth a try. You can get the whole
> schema in one json response. Combined with gzip, should be fairly compact.
>
> If any of these approaches work out and there is a consensus (e.g. between
> Michele and Paul) that the olap4j XMLA driver should use optimized
> protocols (still based on the XMLA specification in a clear way) then I
> think we should move ahead and put them into the driver.
>
> By the way, now the XMLA server is a separate project, it should be easier
> to experiment.
>
>  Julian
>
>
> On Sep 19, 2012, at 1:36 AM, Michele Rossi <michele.rossi at gmail.com>
> wrote:
>
> hi,
> we are also concerned about the poor performances of XMLA, as you say
> below it requires too many queries to obtain metadata and cell data and the
> representation itself is extremely inefficient.
>
> We have a real customer use case of a 20000 rows pivots which is rendered
> in Excel via the SimbaO2X plugin.
> Such pivot results in about 268Mb of XML exchanged between client and
> server.
> What's interesting is that the informative content of such xml is very
> limited and it can be compressed to 1.69Mb (0.6% of the original).
>
> Right at this moment I don't have the time to look at how you're using
> JSON in the new olap4j-xmlaserver but judging from the example provided
> below it looks like you're transferring xmla-like content, only serialised
> in JSON.
> I am thinking that something like that would only be useful if you control
> client and server (unit tests?) or if you manage to create a new standard.
> Or perhaps to create a new olap4j transport driver which could use
> whatever format we want to simply delegate calls over to a remote olap4j
> driver (we have done that here).
> Hope I am not missing something obvious here.
>
> What we've done in my company while building our own olap4j driver was to
> transfer metadata using a custom XSD schema which mirrors closely the
> Olap4j entities.
> This way we can obtain all the data required to deep-populate
> a org.olap4j.metadata.Schema in one call.
> We have used XSD+JAXB with XML serialization for simplicity but we are
> planning to stop using XML all together and use JSON to represent data
> instead.
>
> While our olap4j drivers implements all ResultSet based metadata calls
> such org.olap4j.OlapDatabaseMetaData.getCubes(String, String, String) we
> are not actively using them preferring instead deep metadata objects such
> as org.olap4j.metadata.Schema obtained as described above.
>
> And that was my two cents, hope it helps.
>
> thanks,
> Michele
>
>
> On 19 September 2012 00:46, Julian Hyde <jhyde at pentaho.com> wrote:
>
>> You're right about round trips. I believe I've already addressed that,
>> with the <Deep>true</Deep> option. Search for "testMDCubesDeepJson" in that
>> file and you'll see what I mean.
>>
>> We can rip out some of the header elements that are all about SOAP and
>> XML namespaces. "row" can be replaced with something else ("CUBES" seems
>> more appropriate than "cube", if you look at the DeepJson example). The
>> format needs also needs to be able to return an exception, with standard
>> XMLA response codes.
>>
>> That said, people will put up with some crap in a protocol if it gives
>> them what they need (and reasonably efficiently). So I wouldn't bend over
>> backwards to make it absolutely the sexiest json ever.
>>
>> Take a look at the examples in test file and propose something concrete.
>>
>> Julian
>>
>>
>> On Sep 18, 2012, at 4:29 PM, Paul Stoellberger <p.stoellberger at gmail.com>
>> wrote:
>>
>> One of my biggest concerns over xmla-like communication is the overall
>> overhead in a) number http calls b) processing the results
>>
>> With this row (array) based results we will have to build in logic into
>> the client to be able to understand the logic behind catalogs, schemas,
>> cubes and how they correlate and turn those rows into objects instead of
>> returning objects directly, as we do in saiku today. In order to work with
>> this data I need those objects, and this just doesn't feel right to me.
>>
>> Furthermore this will have to be processed by the client. Now I know this
>> is probably not something very expensive to do but I would rather keep the
>> necessary logic and processing on the client side to an absolute minimum.
>> Doing the same transformations on server side code will always be faster
>> and therefore deliver a better user experience to the user. The less the
>> browser has to work, the better.
>>
>> But my major point of concern is the number of server calls needed to
>> retrieve the necessary metadata information.
>> I admit I don't know the details about all xmla calls but what I can tell
>> by looking at olap4j's xmla driver there are a lot of round trips needed to
>> retrieve all the information necessary. I have plans to understand those
>> calls better in order to improve the xmla driver. Maybe I'm wrong, so feel
>> free to convince me otherwise.
>>
>> Regarding standards, are there any other OLAP servers that return
>> metadata in a xmla like manner?
>> What was good for xml, doesn't have to be necessarily the best for todays
>> "ways of doing things".
>> People were able to consume xmla (like Andy / Roland proved successfully)
>> directly before, however we got great feedback from web developers who
>> confirmed how easy it is to work with our (metadata) API.
>> When I look at public API's like github's I see rich JSON objects and not
>> row based resultsets.
>> What are your arguments to keeping the structure completely the same but
>> just switching from xml to json?
>>
>> Metadata calls are pretty straightforward and there is not much you can
>> do differently, so I am happy to adapt to a better standard (although
>> having 1 server return the data this way doesn't make it a standard).
>>
>> However, the best API and standards don't make good software. My biggest
>> concern is to make things work.
>> If it means I can do stuff more efficient (less http calls, less
>> processing needed) I prefer that over a "clean" API.
>>
>> But maybe in order to move forward:
>> Maybe there is a way how we can turn those xmla like results into
>> something nice thats still mappable to xmla results.
>> E.g. the JSON you pasted below:
>> If "row" becomes "cube" and this cube element (can) contain already
>> hierarchy and/or level information in the same result, but still keep the
>> xmla like result style that would make me already happy.
>>
>> Andy, Roland ... you two have probably the best insights on this topic so
>> it would be great if you could share your opinion with us as well!
>>
>> -Paul
>>
>>
>>
>> On Sep 19, 2012, at 12:44 AM, Julian Hyde wrote:
>>
>> Paul,
>>
>> On IRC yesterday you said there was a more efficient JSON representation
>> of XMLA responses than the one implemented in olap4j-xmlaserver (formerly
>> Mondrian's XMLA server).
>>
>> There are some examples in XmlaBasicTest [ see
>> https://raw.github.com/pentaho/mondrian/master/testsrc/main/mondrian/xmla/XmlaBasicTest.ref.xml and
>> search for "Json" ]. One is
>>
>> "cxmla:DiscoverResponse": {
>>   "xmlns:cxmla": "urn:schemas-microsoft-com:xml-analysis",
>>   "cxmla:return": {
>>     "root": {
>>       "xmlns": "urn:schemas-microsoft-com:xml-analysis:rowset",
>>       "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
>>       "xmlns:xsd": "http://www.w3.org/2001/XMLSchema",
>>       "xmlns:EX": "urn:schemas-microsoft-com:xml-analysis:exception",
>>       "row": [
>>         {
>>           "CATALOG_NAME": "FoodMart",
>>           "SCHEMA_NAME": "FoodMart",
>>           "CUBE_NAME": "HR",
>>           "CUBE_TYPE": "CUBE",
>>           "LAST_SCHEMA_UPDATE": "xxxx-xx-xxTxx:xx:xx",
>>           "IS_DRILLTHROUGH_ENABLED": true,
>>           "IS_WRITE_ENABLED": false,
>>           "IS_LINKABLE": false,
>>           "IS_SQL_ENABLED": false,
>>           "CUBE_CAPTION": "HR",
>>           "DESCRIPTION": "FoodMart Schema - HR Cube"
>>         },
>>         {
>>           "CATALOG_NAME": "FoodMart",
>>           "SCHEMA_NAME": "FoodMart",
>>           "CUBE_NAME": "Sales",
>>           "CUBE_TYPE": "CUBE",
>>           "LAST_SCHEMA_UPDATE": "xxxx-xx-xxTxx:xx:xx",
>>           "IS_DRILLTHROUGH_ENABLED": true,
>>           "IS_WRITE_ENABLED": false,
>>           "IS_LINKABLE": false,
>>           "IS_SQL_ENABLED": false,
>>           "CUBE_CAPTION": "Sales",
>>           "DESCRIPTION": "FoodMart Schema - Sales Cube"
>>         },
>>
>>                 ...
>>
>> and there's another that returns nested metadata (cubes, dimensions,
>> hierarchies, etc.) and another that returns a cell set.
>>
>> I'm open to making this more concise and "json-like" as long as there is
>> a clear mapping from the XMLA spec. (If we depart from the spec entirely
>> we're back in the wild west, where clients can only rely on their own
>> server.) If you have some ideas please propose them.
>>
>> Julian
>>
>>
>>
>> Julian Hyde
>> jhyde at pentaho.com
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20120920/0ae01c97/attachment-0001.html