Saturday, March 11, 2017

Darwino as the IBM Domino reporting Graal(*)

Reports, dashboards, data analytics... have been the conundrum of IBM Notes/Domino since the beginning. Its proprietary data structure, the absence of standard APIs and its deficient query capability make it very difficult. This has been ranked as one of the top need for any for business applications. I know several business partners who created great Domino solutions but struggling with poor reporting capabilities.

Of course some attempts were made to fix it: LEI, DB2NSF,.. all incomplete and anyway defunct. I personally tried a new approach with DomSQL, which, to be frank, I was proud of :-) But this was a first shot, more a POC than a product, although I know several customers using it in production. Unfortunately, despite the community interest, IBM didn't want to move it forward, so it is barely in the state where I left it. On the positive side, it gave me a great low level SQLite experience which is now leveraged in Darwino's mobile database! Learn, learn, learn, you'll always reuse it.

So we are back 2 decades ago, with no solution to this common business problem. Well, you anticipated, Darwino can help! It is actually the first and immediate benefit than our customers are getting when they start with Darwino.

One of the Darwino capability is data replication. It can replicate, two ways and real real time, any Domino data into its own document store. From a Domino standpoint, Darwino is behaving as another Domino server, with high fidelity replication. But Darwino document store is a JSON based store, sitting on top of an RDBMS. As modern RDBMS now understand JSON natively, this is a great fit. The whole replicated data set is then query-able by any tool that understands RDBMS. With no limitation but the relational database itself. Yes, you can join, aggregate, union, calculate, select, sub select... And, contrary to LEI, this is a real incremental replication that can be triggered when a data event occur on the domino server, keeping your data up-to-date at any time. No more nightly/weekly batch data copy, if you know what I mean.

From a technical standpoint, Darwino stores a JSON version of the domino document within a column in a relational table. It is very flexible as it does not require a custom table schema per fdata set, and there is no data loss (e.g. a field you forgot to define a column for...). Moreover, if the default field conversion between the two database is 1-1, this is actually fully customizable. The Darwino replication process can transform the data during the replication phase. This is achieved using a Groovy based DSL, which allows the following:

  • The fields type can be normalized (e.g. make sure that the field in the JSON document is of the expected data type). Common issue with Notes inconsistent data type across documents, right?
  • Domino patterns/workaround can also be normalized. For example, a set a multi value fields describing a table can be transform a true JSON array
  • Computation can be done ahead of time to make the reporting easier (queries, calculated columns...)
  • ... well, your imagination
If you feel that directly reporting on the JSON column within the RDBMS is uneasy, then you can create RDBMS views (standard or materialized), to then act on a table like structure. This is what our partner, CMS, chose to do. Going even further, they are creating these relational views out of the Domino view design elements. Introduction to their product is available here. This is a great use of Darwino.

In a previous post, I spoke about JSQL, which is a easy way for querying JSON data using SQL. Thinking forward, we can provide a JDBC driver to expose JSQL and allow any JDBC compliant client to access the data. From a technical standpoint, we know how to do it: the above mentioned DomSQL features one we can adapt and reuse. This is not in our current plan, but I'll be interested to know if you have any interest in this.

As a summary, using Darwino as a Domino report tool enabler is a fast way to fulfill some crucial, missing Domino capability. It almost instantly makes the existing Domino applications more interesting to the business users. Of course, this is just a subset of what Darwino can bring to the table, but this is a quick, high value one. Please, give it a shot or contact us so we can help you.

BTW, "Graal" is the French spelling for Holy Grail




Friday, March 3, 2017

Schemaless GraphQL

FaceBook officially introduced a few months ago a new technology called GraphQL. Well, rather than really being new, FaceBook made public and open source their internal graph query engine. It starts to be widely used for exposing APIs. For example, IBM Watson Worskpace makes use of it. I also heard that IBM Connections will also use it.

In a nutshell, it allows powerful, tailored queries including navigation between the data sources, in a single query. As a result, it minimizes the number of requests sent by the client to the server, which is a performance boost. Moreover, it proposes a single query format regardless of the data source. Basically, it means that you can query data from an RDBMS or an LDAP directory almost identically, while getting a consistent response format. All good!

To make the queries discoverable, it uses a schema to define the query parameters, the JSON result as well as the graph traversal links.  If this is appealing, it forces the server to implement these schemas before the queries can be emitted by the client. The whole data and navigation models have to be known and implemented up front. Hum, not only this slows down application prototyping, but it makes it hard to link micro-services that don't know about each other.  It also does not work with truly dynamic services, like a JSON document store, where the document schema can be unknown.  Crap, this is exactly what Darwino offers. So the question is: can we still use GraphQL for dynamic services, where data  schema can be static, dynamic or a mix of both? I saw a few posts on the subject but no solution so far.

Fortunately, GraphQL has the notion of an optional field alias. An alias renames a field, so it can appear multiple times in the same root. As a field can also have parameters, we can "tweak" the engine to implement dynamic field accessors. How? Simply by inverting the roles: the alias becomes the field name while the field is a dynamic, typed, JSON path evaluation on the source document.

Ok, let me give you an example. Suppose that we have the following document as a source:
{
  name: "MyProject",
  budget: {
    amount: 10000,
    currency: "USD"
  }
}
The dynamic query to extract the project name and the budget will be like:
{
  doc: Document(unid:"projectid", database:"project") {
    _unid,
    name: string(path:"$.name"),
    budget: number(path:"$.budget.amount")
  }
}
'string' and 'number' (as well as others, like boolean...) are typed, pseudo field names that are actually evaluating JSON path queries on the source document. And because the path is passed as an argument, it can be fully dynamic! We are using JSON path here, but any interpreted language could be used, including JavaScript or others. Also note that the Document accessor is also implemented as a field with parameters. This triggers the actual load of the JSON document from the database.
How does that sound? You can find a real example dealing with the JSON store in the Darwino playground:

But this is not enough, as we also need to implement the links to navigate the graph. For example, we need to navigate to a JSON document by id, that id being a field in the current object. To make this possible, we introduce here the notion of computed parameter, using the ${....} syntax (remember JSF?).
The pseudo query bellow assumes that you have tasks documents that can be accessed using the parent project id:
{
  doc: Document(unid:"projectid", database:"project") {
    _unid,
    name: string(path:"$.name"),
    budget: number(path:"$.budget.amount")
    tasks: Documents(parent:"${_unid}", database:"tasks") {
     taskname: string(path:"$.name"),
    }
  }
}
The drawback of this solution is that every property becomes a string, to accept a formula. Another solution would duplicate all the fields and have, for example, parent$ as a computed version of parent. As the whole library is under development, I'm not sure yet what's best, so any feedback is welcome!
Note that Documents returns a collection of documents, as opposed to Document that only returns one.

Of course, this is a simple introduction to a schemaless implementation of GraphQL. There are more details, but it expands the GraphQL technology by making it more dynamic, while the retaining the static parts. Look, for example, at this snippets mixing static and dynamic attributes:
The early Darwino 2.0 code includes this capability in its Java based implementation, released as open source on darwino.org. Any GraphQL object can implement a specific interface that will give it the dynamic capability (json path...) on top of its static fields. If you go back to the previous example, you'll see that the document UNID is a static field specific to the Darwino JSON document.

There are more to discover, but hold on: if our session is accepted at Engage 2017, Paul Withers and I will give more details on the implementation, along with an IBM Notes/Domino implementation!