Friday, March 3, 2017

Schemaless GraphQL

FaceBook officially introduced a few months ago a new technology called GraphQL. Well, rather than really being new, FaceBook made public and open source their internal graph query engine. It starts to be widely used for exposing APIs. For example, IBM Watson Worskpace makes use of it. I also heard that IBM Connections will also use it.

In a nutshell, it allows powerful, tailored queries including navigation between the data sources, in a single query. As a result, it minimizes the number of requests sent by the client to the server, which is a performance boost. Moreover, it proposes a single query format regardless of the data source. Basically, it means that you can query data from an RDBMS or an LDAP directory almost identically, while getting a consistent response format. All good!

To make the queries discoverable, it uses a schema to define the query parameters, the JSON result as well as the graph traversal links.  If this is appealing, it forces the server to implement these schemas before the queries can be emitted by the client. The whole data and navigation models have to be known and implemented up front. Hum, not only this slows down application prototyping, but it makes it hard to link micro-services that don't know about each other.  It also does not work with truly dynamic services, like a JSON document store, where the document schema can be unknown.  Crap, this is exactly what Darwino offers. So the question is: can we still use GraphQL for dynamic services, where data  schema can be static, dynamic or a mix of both? I saw a few posts on the subject but no solution so far.

Fortunately, GraphQL has the notion of an optional field alias. An alias renames a field, so it can appear multiple times in the same root. As a field can also have parameters, we can "tweak" the engine to implement dynamic field accessors. How? Simply by inverting the roles: the alias becomes the field name while the field is a dynamic, typed, JSON path evaluation on the source document.

Ok, let me give you an example. Suppose that we have the following document as a source:
{
  name: "MyProject",
  budget: {
    amount: 10000,
    currency: "USD"
  }
}
The dynamic query to extract the project name and the budget will be like:
{
  doc: Document(unid:"projectid", database:"project") {
    _unid,
    name: string(path:"$.name"),
    budget: number(path:"$.budget.amount")
  }
}
'string' and 'number' (as well as others, like boolean...) are typed, pseudo field names that are actually evaluating JSON path queries on the source document. And because the path is passed as an argument, it can be fully dynamic! We are using JSON path here, but any interpreted language could be used, including JavaScript or others. Also note that the Document accessor is also implemented as a field with parameters. This triggers the actual load of the JSON document from the database.
How does that sound? You can find a real example dealing with the JSON store in the Darwino playground:

But this is not enough, as we also need to implement the links to navigate the graph. For example, we need to navigate to a JSON document by id, that id being a field in the current object. To make this possible, we introduce here the notion of computed parameter, using the ${....} syntax (remember JSF?).
The pseudo query bellow assumes that you have tasks documents that can be accessed using the parent project id:
{
  doc: Document(unid:"projectid", database:"project") {
    _unid,
    name: string(path:"$.name"),
    budget: number(path:"$.budget.amount")
    tasks: Documents(parent:"${_unid}", database:"tasks") {
     taskname: string(path:"$.name"),
    }
  }
}
The drawback of this solution is that every property becomes a string, to accept a formula. Another solution would duplicate all the fields and have, for example, parent$ as a computed version of parent. As the whole library is under development, I'm not sure yet what's best, so any feedback is welcome!
Note that Documents returns a collection of documents, as opposed to Document that only returns one.

Of course, this is a simple introduction to a schemaless implementation of GraphQL. There are more details, but it expands the GraphQL technology by making it more dynamic, while the retaining the static parts. Look, for example, at this snippets mixing static and dynamic attributes:
The early Darwino 2.0 code includes this capability in its Java based implementation, released as open source on darwino.org. Any GraphQL object can implement a specific interface that will give it the dynamic capability (json path...) on top of its static fields. If you go back to the previous example, you'll see that the document UNID is a static field specific to the Darwino JSON document.

There are more to discover, but hold on: if our session is accepted at Engage 2017, Paul Withers and I will give more details on the implementation, along with an IBM Notes/Domino implementation!

0 comments:

Post a Comment