[Mondrian] Efficient JSON representation for "XMLA" responses

Julian Hyde jhyde at pentaho.com
Wed Sep 19 19:46:38 EDT 2012


I'd try enabling gzip on the HTTP requests. Should help a lot. (Never done it myself, but let me know how it goes.)

Doesn't sound that easy to proxy one olap4j driver's metadata objects over the wire. They're not (designed to be) serializable.

Json result format with deep=true is worth a try. You can get the whole schema in one json response. Combined with gzip, should be fairly compact.

If any of these approaches work out and there is a consensus (e.g. between Michele and Paul) that the olap4j XMLA driver should use optimized protocols (still based on the XMLA specification in a clear way) then I think we should move ahead and put them into the driver.

By the way, now the XMLA server is a separate project, it should be easier to experiment.

Julian


On Sep 19, 2012, at 1:36 AM, Michele Rossi <michele.rossi at gmail.com<mailto:michele.rossi at gmail.com>> wrote:

hi,
we are also concerned about the poor performances of XMLA, as you say below it requires too many queries to obtain metadata and cell data and the representation itself is extremely inefficient.

We have a real customer use case of a 20000 rows pivots which is rendered in Excel via the SimbaO2X plugin.
Such pivot results in about 268Mb of XML exchanged between client and server.
What's interesting is that the informative content of such xml is very limited and it can be compressed to 1.69Mb (0.6% of the original).

Right at this moment I don't have the time to look at how you're using JSON in the new olap4j-xmlaserver but judging from the example provided below it looks like you're transferring xmla-like content, only serialised in JSON.
I am thinking that something like that would only be useful if you control client and server (unit tests?) or if you manage to create a new standard.
Or perhaps to create a new olap4j transport driver which could use whatever format we want to simply delegate calls over to a remote olap4j driver (we have done that here).
Hope I am not missing something obvious here.

What we've done in my company while building our own olap4j driver was to transfer metadata using a custom XSD schema which mirrors closely the Olap4j entities.
This way we can obtain all the data required to deep-populate a org.olap4j.metadata.Schema in one call.
We have used XSD+JAXB with XML serialization for simplicity but we are planning to stop using XML all together and use JSON to represent data instead.

While our olap4j drivers implements all ResultSet based metadata calls such org.olap4j.OlapDatabaseMetaData.getCubes(String, String, String) we are not actively using them preferring instead deep metadata objects such as org.olap4j.metadata.Schema obtained as described above.

And that was my two cents, hope it helps.

thanks,
Michele


On 19 September 2012 00:46, Julian Hyde <jhyde at pentaho.com<mailto:jhyde at pentaho.com>> wrote:
You're right about round trips. I believe I've already addressed that, with the <Deep>true</Deep> option. Search for "testMDCubesDeepJson" in that file and you'll see what I mean.

We can rip out some of the header elements that are all about SOAP and XML namespaces. "row" can be replaced with something else ("CUBES" seems more appropriate than "cube", if you look at the DeepJson example). The format needs also needs to be able to return an exception, with standard XMLA response codes.

That said, people will put up with some crap in a protocol if it gives them what they need (and reasonably efficiently). So I wouldn't bend over backwards to make it absolutely the sexiest json ever.

Take a look at the examples in test file and propose something concrete.

Julian


On Sep 18, 2012, at 4:29 PM, Paul Stoellberger <p.stoellberger at gmail.com<mailto:p.stoellberger at gmail.com>> wrote:

One of my biggest concerns over xmla-like communication is the overall overhead in a) number http calls b) processing the results

With this row (array) based results we will have to build in logic into the client to be able to understand the logic behind catalogs, schemas, cubes and how they correlate and turn those rows into objects instead of returning objects directly, as we do in saiku today. In order to work with this data I need those objects, and this just doesn't feel right to me.

Furthermore this will have to be processed by the client. Now I know this is probably not something very expensive to do but I would rather keep the necessary logic and processing on the client side to an absolute minimum. Doing the same transformations on server side code will always be faster and therefore deliver a better user experience to the user. The less the browser has to work, the better.

But my major point of concern is the number of server calls needed to retrieve the necessary metadata information.
I admit I don't know the details about all xmla calls but what I can tell by looking at olap4j's xmla driver there are a lot of round trips needed to retrieve all the information necessary. I have plans to understand those calls better in order to improve the xmla driver. Maybe I'm wrong, so feel free to convince me otherwise.

Regarding standards, are there any other OLAP servers that return metadata in a xmla like manner?
What was good for xml, doesn't have to be necessarily the best for todays "ways of doing things".
People were able to consume xmla (like Andy / Roland proved successfully) directly before, however we got great feedback from web developers who confirmed how easy it is to work with our (metadata) API.
When I look at public API's like github's I see rich JSON objects and not row based resultsets.
What are your arguments to keeping the structure completely the same but just switching from xml to json?

Metadata calls are pretty straightforward and there is not much you can do differently, so I am happy to adapt to a better standard (although having 1 server return the data this way doesn't make it a standard).

However, the best API and standards don't make good software. My biggest concern is to make things work.
If it means I can do stuff more efficient (less http calls, less processing needed) I prefer that over a "clean" API.

But maybe in order to move forward:
Maybe there is a way how we can turn those xmla like results into something nice thats still mappable to xmla results.
E.g. the JSON you pasted below:
If "row" becomes "cube" and this cube element (can) contain already hierarchy and/or level information in the same result, but still keep the xmla like result style that would make me already happy.

Andy, Roland ... you two have probably the best insights on this topic so it would be great if you could share your opinion with us as well!

-Paul



On Sep 19, 2012, at 12:44 AM, Julian Hyde wrote:

Paul,

On IRC yesterday you said there was a more efficient JSON representation of XMLA responses than the one implemented in olap4j-xmlaserver (formerly Mondrian's XMLA server).

There are some examples in XmlaBasicTest [ see https://raw.github.com/pentaho/mondrian/master/testsrc/main/mondrian/xmla/XmlaBasicTest.ref.xml and search for "Json" ]. One is

"cxmla:DiscoverResponse": {
  "xmlns:cxmla": "urn:schemas-microsoft-com:xml-analysis",
  "cxmla:return": {
    "root": {
      "xmlns": "urn:schemas-microsoft-com:xml-analysis:rowset",
      "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
      "xmlns:xsd": "http://www.w3.org/2001/XMLSchema",
      "xmlns:EX": "urn:schemas-microsoft-com:xml-analysis:exception",
      "row": [
        {
          "CATALOG_NAME": "FoodMart",
          "SCHEMA_NAME": "FoodMart",
          "CUBE_NAME": "HR",
          "CUBE_TYPE": "CUBE",
          "LAST_SCHEMA_UPDATE": "xxxx-xx-xxTxx:xx:xx",
          "IS_DRILLTHROUGH_ENABLED": true,
          "IS_WRITE_ENABLED": false,
          "IS_LINKABLE": false,
          "IS_SQL_ENABLED": false,
          "CUBE_CAPTION": "HR",
          "DESCRIPTION": "FoodMart Schema - HR Cube"
        },
        {
          "CATALOG_NAME": "FoodMart",
          "SCHEMA_NAME": "FoodMart",
          "CUBE_NAME": "Sales",
          "CUBE_TYPE": "CUBE",
          "LAST_SCHEMA_UPDATE": "xxxx-xx-xxTxx:xx:xx",
          "IS_DRILLTHROUGH_ENABLED": true,
          "IS_WRITE_ENABLED": false,
          "IS_LINKABLE": false,
          "IS_SQL_ENABLED": false,
          "CUBE_CAPTION": "Sales",
          "DESCRIPTION": "FoodMart Schema - Sales Cube"
        },

                ...

and there's another that returns nested metadata (cubes, dimensions, hierarchies, etc.) and another that returns a cell set.

I'm open to making this more concise and "json-like" as long as there is a clear mapping from the XMLA spec. (If we depart from the spec entirely we're back in the wild west, where clients can only rely on their own server.) If you have some ideas please propose them.

Julian



Julian Hyde
jhyde at pentaho.com<mailto:jhyde at pentaho.com>






_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org<mailto:Mondrian at pentaho.org>
http://lists.pentaho.org/mailman/listinfo/mondrian


_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org<mailto:Mondrian at pentaho.org>
http://lists.pentaho.org/mailman/listinfo/mondrian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20120919/dafc5114/attachment.html 


More information about the Mondrian mailing list