Thursday, January 26, 2017

When SQL meets NoSQL, you get the best of both worlds!

At the heart of Darwino is an advanced, portable JSON document store, implemented on top of any relational database. I'm often being asked the following question "why did you implement that on top of an RDBMS?". Behind the scene, the real question is: "why are you not using MongoDB or another nosql database?"
Well, I'm generally answering it with multiple arguments:
  • It leverages all the RDBMS well known capabilities: transactions, data integrity, security, backups, performance, reporting, analytics...
  • Nowadays, RDBMS handle JSON natively, thus providing high performance access to any data in this format. 
  • Portability: it deploys on existing infrastructure, whenever it is cloud or on premises. Run your app on IBM Bluemix on top of DB2, or MS Azure on top of SQL Server... And on-premises, I don't know any organization that does not already have an RDBMS validated by the IT department. 
But there is now another big reason: queries! This is another step in leveraging the native JSON support available in all major RDBMS.

From the beginning, Darwino comes with a JSON based, MongoDB like, query language. It fully abstracts the relational database details, by converting the JSON query into SQL for the target database. It hides the relational model and behaves exactly the same on every single relational database. Plus it adds IBM Domino like capabilities, including category records, response documents, full text search... Of course, it honors the document based security. So it is very powerful. You can see it in action here.

But it also has its own set of limitations. In particular, because it hides the relational model, it cannot take advantage of it. I'm thinking about joins, subqueries, unions... Remember, it is modeled over MongoDB and IBM Domino, which do not support these capabilities.

That's the reason why are introducing JSQL, which stands for JSON SQL. It complements the existing query language. In a nutshell, this is SQL targeting JSON document collections. Think a document collection as relational table, where each document is a row. The columns become JSON paths on the documents.

The beauty of implementing a NoSQL store on top of a RDBMS, is that you don't have to implement the SQL query engine yourself, but you can rely on the well proven underlying database. I don't know any NoSQL database that is just approaching what a mature SQL database can do in term of query. With behind it, decades of research and query optimization techniques that will be hard to re-implement! And finally, if a vendor does not give you entire satisfaction, then you can move to the next one. Frankly, isn't that more trustable than any proprietary NoSQL database, even open source?

Ok, back to JSQL. Let's start with a simple example. Suppose that you have a collection of JSON documents named 'Pinball'. Each pinball document has a few fields, like these:
  { name: 'Revenge of Mars', manufacturer: 'Bally', ....}

A JSQL query to list all the pinball machines in the database would be:
 SELECT $.manufacturer as manufacturer, $.name as name 
FROM pinball
ORDER BY $.manufacturer, $.name

Easy, isn't it? The syntax '$.a' is actually a JSON path that extracts the field 'a' from the JSON document. It can obviously be more complex, like '$.x.y.z', to extract hierarchical data.
Note that you can also get the whole JSON document with the simple '$' JSON Path:
  SELECT $ as doc from pinball

Under the hood, Darwino parses the original JSQL query and generates the final query for the target RDBMS. For example, here is how the query above is converted to Postgresql:
  WITH TB1 AS 
    (SELECT *
    FROM playground_DOC
    WHERE STOREID='pinball'
            AND INSTID='')
  SELECT
    jsonb_extract_path_text(JSON,
'manufacturer')::text AS manufacturer,
    jsonb_extract_path_text(JSON,'name')::text AS name
  FROM TB1
  ORDER BY
    jsonb_extract_path_text(JSON,'manufacturer')::text,
    jsonb_extract_path_text(JSON,'name')::text

The JSQL queries can be as complex as the database supports: it includes clauses like WHERE, ORDER BY, GROUP BY/HAVING, any JOIN, subqueries, functions, aggregation, UNION... Again, anything supported by the underlying database can be used. Also, during the conversion step, Darwino hides as much as possible the SQL differences between the database vendors. It even works on mobile devices on top of SQLite!

Here is, another example: a query joining 3 document collections: pinballs, owners and a relation (owns) between the pinballs and the owners:
SELECT O.$.firstName firstname, 
  O.$.lastName lastname,
  P.$.brand brand,
  P.$.name name
FROM owners O
LEFT OUTER JOIN owns R ON R.$.owner=O._unid
LEFT OUTER JOIN pinball P ON R.$.ipdb=P._unid
ORDER BY firstname, lastname
(The field syntax _xyz allows access to the document meta-data, stored outside of the JSON document. It includes the document unid, the creation date...)

Another one? What about a subquery to find the most expensive pinball:
  SELECT P.$.name name, P.$.manufacturer manufacturer,        P."$.value"::number "value"
FROM pinball P,
(SELECT MAX("$.value"::number) val FROM pinball) MT
WHERE P."$.value"::number=MT.val
Live Example
(Because there is no JSON schema, any JSON path is assumed to be a string by default. Darwino supports the :: cast operator to specify other data types, like ::number)

It, of course, preserves the document based security (a.k.a. readers/editors). Look at this example, and you'll see that the generated SQL is decorated with the proper security condition on the reader fields.

Thinking even further, we can provide a JDBC driver that will allow any JDBC client, like report engines, to connect to the database, while preserving the whole security model! For the record, I already created such a remote JDBC driver for DomSQL, so it would be easy reuse this piece of code.

This feature is still under development but can be previewed in the Darwino 2.0 code stream, and live from the Playground. Hope you guys like it, and see the value of having an RDBMS behind the scene! Any feedback is more than welcome.

BTW, I'm a fan of Pinball machines, and so are my friends at WebGate. :-) I can show you my pinball database, augmented with AI, at Connect 2017. See you at booth #630 in the showcase.


0 comments:

Post a Comment