Saturday, March 11, 2017

Darwino as the IBM Domino reporting Graal(*)

Reports, dashboards, data analytics... have been the conundrum of IBM Notes/Domino since the beginning. Its proprietary data structure, the absence of standard APIs and its deficient query capability make it very difficult. This has been ranked as one of the top need for any for business applications. I know several business partners who created great Domino solutions but struggling with poor reporting capabilities.

Of course some attempts were made to fix it: LEI, DB2NSF,.. all incomplete and anyway defunct. I personally tried a new approach with DomSQL, which, to be frank, I was proud of :-) But this was a first shot, more a POC than a product, although I know several customers using it in production. Unfortunately, despite the community interest, IBM didn't want to move it forward, so it is barely in the state where I left it. On the positive side, it gave me a great low level SQLite experience which is now leveraged in Darwino's mobile database! Learn, learn, learn, you'll always reuse it.

So we are back 2 decades ago, with no solution to this common business problem. Well, you anticipated, Darwino can help! It is actually the first and immediate benefit than our customers are getting when they start with Darwino.

One of the Darwino capability is data replication. It can replicate, two ways and real real time, any Domino data into its own document store. From a Domino standpoint, Darwino is behaving as another Domino server, with high fidelity replication. But Darwino document store is a JSON based store, sitting on top of an RDBMS. As modern RDBMS now understand JSON natively, this is a great fit. The whole replicated data set is then query-able by any tool that understands RDBMS. With no limitation but the relational database itself. Yes, you can join, aggregate, union, calculate, select, sub select... And, contrary to LEI, this is a real incremental replication that can be triggered when a data event occur on the domino server, keeping your data up-to-date at any time. No more nightly/weekly batch data copy, if you know what I mean.

From a technical standpoint, Darwino stores a JSON version of the domino document within a column in a relational table. It is very flexible as it does not require a custom table schema per fdata set, and there is no data loss (e.g. a field you forgot to define a column for...). Moreover, if the default field conversion between the two database is 1-1, this is actually fully customizable. The Darwino replication process can transform the data during the replication phase. This is achieved using a Groovy based DSL, which allows the following:

  • The fields type can be normalized (e.g. make sure that the field in the JSON document is of the expected data type). Common issue with Notes inconsistent data type across documents, right?
  • Domino patterns/workaround can also be normalized. For example, a set a multi value fields describing a table can be transform a true JSON array
  • Computation can be done ahead of time to make the reporting easier (queries, calculated columns...)
  • ... well, your imagination
If you feel that directly reporting on the JSON column within the RDBMS is uneasy, then you can create RDBMS views (standard or materialized), to then act on a table like structure. This is what our partner, CMS, chose to do. Going even further, they are creating these relational views out of the Domino view design elements. Introduction to their product is available here. This is a great use of Darwino.

In a previous post, I spoke about JSQL, which is a easy way for querying JSON data using SQL. Thinking forward, we can provide a JDBC driver to expose JSQL and allow any JDBC compliant client to access the data. From a technical standpoint, we know how to do it: the above mentioned DomSQL features one we can adapt and reuse. This is not in our current plan, but I'll be interested to know if you have any interest in this.

As a summary, using Darwino as a Domino report tool enabler is a fast way to fulfill some crucial, missing Domino capability. It almost instantly makes the existing Domino applications more interesting to the business users. Of course, this is just a subset of what Darwino can bring to the table, but this is a quick, high value one. Please, give it a shot or contact us so we can help you.

BTW, "Graal" is the French spelling for Holy Grail




Friday, March 3, 2017

Schemaless GraphQL

FaceBook officially introduced a few months ago a new technology called GraphQL. Well, rather than really being new, FaceBook made public and open source their internal graph query engine. It starts to be widely used for exposing APIs. For example, IBM Watson Worskpace makes use of it. I also heard that IBM Connections will also use it.

In a nutshell, it allows powerful, tailored queries including navigation between the data sources, in a single query. As a result, it minimizes the number of requests sent by the client to the server, which is a performance boost. Moreover, it proposes a single query format regardless of the data source. Basically, it means that you can query data from an RDBMS or an LDAP directory almost identically, while getting a consistent response format. All good!

To make the queries discoverable, it uses a schema to define the query parameters, the JSON result as well as the graph traversal links.  If this is appealing, it forces the server to implement these schemas before the queries can be emitted by the client. The whole data and navigation models have to be known and implemented up front. Hum, not only this slows down application prototyping, but it makes it hard to link micro-services that don't know about each other.  It also does not work with truly dynamic services, like a JSON document store, where the document schema can be unknown.  Crap, this is exactly what Darwino offers. So the question is: can we still use GraphQL for dynamic services, where data  schema can be static, dynamic or a mix of both? I saw a few posts on the subject but no solution so far.

Fortunately, GraphQL has the notion of an optional field alias. An alias renames a field, so it can appear multiple times in the same root. As a field can also have parameters, we can "tweak" the engine to implement dynamic field accessors. How? Simply by inverting the roles: the alias becomes the field name while the field is a dynamic, typed, JSON path evaluation on the source document.

Ok, let me give you an example. Suppose that we have the following document as a source:
{
  name: "MyProject",
  budget: {
    amount: 10000,
    currency: "USD"
  }
}
The dynamic query to extract the project name and the budget will be like:
{
  doc: Document(unid:"projectid", database:"project") {
    _unid,
    name: string(path:"$.name"),
    budget: number(path:"$.budget.amount")
  }
}
'string' and 'number' (as well as others, like boolean...) are typed, pseudo field names that are actually evaluating JSON path queries on the source document. And because the path is passed as an argument, it can be fully dynamic! We are using JSON path here, but any interpreted language could be used, including JavaScript or others. Also note that the Document accessor is also implemented as a field with parameters. This triggers the actual load of the JSON document from the database.
How does that sound? You can find a real example dealing with the JSON store in the Darwino playground:

But this is not enough, as we also need to implement the links to navigate the graph. For example, we need to navigate to a JSON document by id, that id being a field in the current object. To make this possible, we introduce here the notion of computed parameter, using the ${....} syntax (remember JSF?).
The pseudo query bellow assumes that you have tasks documents that can be accessed using the parent project id:
{
  doc: Document(unid:"projectid", database:"project") {
    _unid,
    name: string(path:"$.name"),
    budget: number(path:"$.budget.amount")
    tasks: Documents(parent:"${_unid}", database:"tasks") {
     taskname: string(path:"$.name"),
    }
  }
}
The drawback of this solution is that every property becomes a string, to accept a formula. Another solution would duplicate all the fields and have, for example, parent$ as a computed version of parent. As the whole library is under development, I'm not sure yet what's best, so any feedback is welcome!
Note that Documents returns a collection of documents, as opposed to Document that only returns one.

Of course, this is a simple introduction to a schemaless implementation of GraphQL. There are more details, but it expands the GraphQL technology by making it more dynamic, while the retaining the static parts. Look, for example, at this snippets mixing static and dynamic attributes:
The early Darwino 2.0 code includes this capability in its Java based implementation, released as open source on darwino.org. Any GraphQL object can implement a specific interface that will give it the dynamic capability (json path...) on top of its static fields. If you go back to the previous example, you'll see that the document UNID is a static field specific to the Darwino JSON document.

There are more to discover, but hold on: if our session is accepted at Engage 2017, Paul Withers and I will give more details on the implementation, along with an IBM Notes/Domino implementation!

Thursday, February 16, 2017

Get your apps integrated with IBM Connections, cloud and on-premises!

I've been using this blog to share some of the techniques we use in ProjExec to get tightly integrated with the Connections platform. I got a lot of feedback from developers who wanted to know more, so I'm moving a step further: Jesse Gallagher and I will describe these techniques in a breakout session @Connect 2017!

DEV-1430 : IBM Connections Integration: Exploring the Long List of Options
Program : Development, Design and Tools
Topic : Enterprise collaboration
Session Type : Breakout Session
Date/Time : Wed, 22-Feb, 10:00 AM-10:45 AM
Location : Moscone West, Level 2 - Room 2006
Presenter(s) : Philippe Riand, Triloggroup; Jesse Gallagher, I Know Some Guys, LLC

Basically, we'll show for both on the cloud and on premises Connections:

  • how to get your user authenticated by Connections, using OAuth or SSO
  • how to call services in the behalf of that current user
  • how to get your apps in the IBM Connection menu
  • how to integrate the navbar, including on premises!
  • how to share code common to community apps (cloud) and iWidget (on premises)
Beyond live code and demos, we will release a reusable library as open source, with all the necessary code to get your own Java based application integrated effortless. It will be available on darwino.org, a branch of OpenNTF.

To make that possible, IBM helped by providing a long awaited ask,  Wait wait wait, I'm not going to reveal that here, but an IBMer will be on stage to explain. Wait again, should not talk about this neither, people have to come to the session!

See you @Connect, and don't miss this session if you want to be an IBM Connections developer hero!


Thursday, January 26, 2017

When SQL meets NoSQL, you get the best of both worlds!

At the heart of Darwino is an advanced, portable JSON document store, implemented on top of any relational database. I'm often being asked the following question "why did you implement that on top of an RDBMS?". Behind the scene, the real question is: "why are you not using MongoDB or another nosql database?"
Well, I'm generally answering it with multiple arguments:
  • It leverages all the RDBMS well known capabilities: transactions, data integrity, security, backups, performance, reporting, analytics...
  • Nowadays, RDBMS handle JSON natively, thus providing high performance access to any data in this format. 
  • Portability: it deploys on existing infrastructure, whenever it is cloud or on premises. Run your app on IBM Bluemix on top of DB2, or MS Azure on top of SQL Server... And on-premises, I don't know any organization that does not already have an RDBMS validated by the IT department. 
But there is now another big reason: queries! This is another step in leveraging the native JSON support available in all major RDBMS.

From the beginning, Darwino comes with a JSON based, MongoDB like, query language. It fully abstracts the relational database details, by converting the JSON query into SQL for the target database. It hides the relational model and behaves exactly the same on every single relational database. Plus it adds IBM Domino like capabilities, including category records, response documents, full text search... Of course, it honors the document based security. So it is very powerful. You can see it in action here.

But it also has its own set of limitations. In particular, because it hides the relational model, it cannot take advantage of it. I'm thinking about joins, subqueries, unions... Remember, it is modeled over MongoDB and IBM Domino, which do not support these capabilities.

That's the reason why are introducing JSQL, which stands for JSON SQL. It complements the existing query language. In a nutshell, this is SQL targeting JSON document collections. Think a document collection as relational table, where each document is a row. The columns become JSON paths on the documents.

The beauty of implementing a NoSQL store on top of a RDBMS, is that you don't have to implement the SQL query engine yourself, but you can rely on the well proven underlying database. I don't know any NoSQL database that is just approaching what a mature SQL database can do in term of query. With behind it, decades of research and query optimization techniques that will be hard to re-implement! And finally, if a vendor does not give you entire satisfaction, then you can move to the next one. Frankly, isn't that more trustable than any proprietary NoSQL database, even open source?

Ok, back to JSQL. Let's start with a simple example. Suppose that you have a collection of JSON documents named 'Pinball'. Each pinball document has a few fields, like these:
  { name: 'Revenge of Mars', manufacturer: 'Bally', ....}

A JSQL query to list all the pinball machines in the database would be:
 SELECT $.manufacturer as manufacturer, $.name as name 
FROM pinball
ORDER BY $.manufacturer, $.name

Easy, isn't it? The syntax '$.a' is actually a JSON path that extracts the field 'a' from the JSON document. It can obviously be more complex, like '$.x.y.z', to extract hierarchical data.
Note that you can also get the whole JSON document with the simple '$' JSON Path:
  SELECT $ as doc from pinball

Under the hood, Darwino parses the original JSQL query and generates the final query for the target RDBMS. For example, here is how the query above is converted to Postgresql:
  WITH TB1 AS 
    (SELECT *
    FROM playground_DOC
    WHERE STOREID='pinball'
            AND INSTID='')
  SELECT
    jsonb_extract_path_text(JSON,
'manufacturer')::text AS manufacturer,
    jsonb_extract_path_text(JSON,'name')::text AS name
  FROM TB1
  ORDER BY
    jsonb_extract_path_text(JSON,'manufacturer')::text,
    jsonb_extract_path_text(JSON,'name')::text

The JSQL queries can be as complex as the database supports: it includes clauses like WHERE, ORDER BY, GROUP BY/HAVING, any JOIN, subqueries, functions, aggregation, UNION... Again, anything supported by the underlying database can be used. Also, during the conversion step, Darwino hides as much as possible the SQL differences between the database vendors. It even works on mobile devices on top of SQLite!

Here is, another example: a query joining 3 document collections: pinballs, owners and a relation (owns) between the pinballs and the owners:
SELECT O.$.firstName firstname, 
  O.$.lastName lastname,
  P.$.brand brand,
  P.$.name name
FROM owners O
LEFT OUTER JOIN owns R ON R.$.owner=O._unid
LEFT OUTER JOIN pinball P ON R.$.ipdb=P._unid
ORDER BY firstname, lastname
(The field syntax _xyz allows access to the document meta-data, stored outside of the JSON document. It includes the document unid, the creation date...)

Another one? What about a subquery to find the most expensive pinball:
  SELECT P.$.name name, P.$.manufacturer manufacturer,        P."$.value"::number "value"
FROM pinball P,
(SELECT MAX("$.value"::number) val FROM pinball) MT
WHERE P."$.value"::number=MT.val
Live Example
(Because there is no JSON schema, any JSON path is assumed to be a string by default. Darwino supports the :: cast operator to specify other data types, like ::number)

It, of course, preserves the document based security (a.k.a. readers/editors). Look at this example, and you'll see that the generated SQL is decorated with the proper security condition on the reader fields.

Thinking even further, we can provide a JDBC driver that will allow any JDBC client, like report engines, to connect to the database, while preserving the whole security model! For the record, I already created such a remote JDBC driver for DomSQL, so it would be easy reuse this piece of code.

This feature is still under development but can be previewed in the Darwino 2.0 code stream, and live from the Playground. Hope you guys like it, and see the value of having an RDBMS behind the scene! Any feedback is more than welcome.

BTW, I'm a fan of Pinball machines, and so are my friends at WebGate. :-) I can show you my pinball database, augmented with AI, at Connect 2017. See you at booth #630 in the showcase.


Wednesday, January 25, 2017

ReactJS or AngularJS? What about something else?

So far, ProjExec has been a really good citizen in the IBM/ICS world as we tried to reuse the core Connections stack as much as can (Dojo, OneUI, ...). But these technologies start to age while the browser technologies evolved a lot in the past years: what required a whole bunch of JavaScript using Dojo/JQuery can now be squeezed in a few lines using new libraries! It is time to change gears.

We started to look at what technology would better fit our needs. The main requirements are:
  • Make the developers productive - easy to integrate new developers
  • Have a decent performance when the application grows - ProjExec is large!
  • Can integrate with the existing code, and be used incrementally within ProjExec
  • The tool chain should integrate well with what we currently use (Maven, Eclipse...)
  • Can be debugged easily, with mean full logs
  • Have a eco-system with ready to use libraries (bootstrap, mobile, ...)
We already started to use AngularJS 1.x 2 yrs ago for some new modules. The initial steps were pretty easy, although it started to be more difficult when the complexity increased. And anyway, we know that AngularJS 1 is behind us. We cannot decently bet our future on it.

The natural next move was to use Angular 2, as it fixes most of the issues found in 1.x. I would say this is true, but the learning curve grew significantly. That includes some technology choices (ex: Observable vs Promises), Typescript, the tool chain required to get an application properly packaged, and the heaviness of the resulting code... I spent time playing with it for Darwino (we'll release some examples), but I do not see the ProjExec developers being quickly productive with it. This has been further confirmed by one of my friend, a GDE (Google Developer Expert, the equivalent of IBM Champion): his one week training class is barely sufficient to get the developers comfortable. Also, getting it integrated in a venerable existing application raises some challenges.
Ok, we'll might use it later with Ionic 2 for a dedicated mobile UI, but for now, let's look if something easier can be used on the existing web application.

The next candidate on the list is Facebook's ReactJS. Clearly, there is a buzz around this technology. It has very interesting concepts, it creates high performance applications. But the initial learning curve is even bigger than Angular's, particularly for Java developers. Moreover, it is pretty low level, which provides great flexibility but you then have to choose the proper companion libraries.
We'll keep this in the back burner, let's look is something else better suits our needs.

Finally, we discovered that another library got a lot of traction last year: it is called Vue.js (https://vuejs.org/). After a first look at it, it feels like what I hoped Angular 2 would be! Ok, I bought 2 books from Packt ($5 each), got the source code from Github and started my deep learning. It features exactly what we need:
  • Easy learning curve, particularly when you have some Angular 1 background. Anyway easier than Angular or React
  • You don't need a whole set of tools/CLI to be in place up front. Just include the single JS file and you can get started. It happens that we'll finally use Webpack, but this was not a requirement from the beginning.
  • It fixes many of the problems/inconsistencies found in Angular JS 1.x (bind vs values, separation of concerns, single file components...)
  • It easily integrates within existing applications, regardless of what they already use (Dojo, JQuery, ...), even if they are not designed as SPAs
  • It performs as good as ReactJS. Moreover, if you're a JSX fan, it is available as well
  • It is very lightweight and non intrusive
  • Developer's productivity feels superior to ReactJS
  • A good set of tools/libraries/docs is available https://github.com/vuejs/awesome-vue
  • See how it compares to the other libraries: https://vuejs.org/v2/guide/comparison.html
We decided to go with it. The early feedback I got from the development team is very positive. We'll see in the longer run if it keeps its promises, but so far so good.

I'll be interested to hear anybody else experience. Please feel free to comment.


Tuesday, January 3, 2017

Why AngularJS sounds familiar to XPages developers...

When I started to look at AngularJS a few years ago, I surprisingly found myself quickly comfortable with this technology. One of the reason is that many of its concepts are shared with XPages.  Of course, there are fundamental differences, the most obvious being AngularJS a pure client technology while XPages, based on JSF, is a server side one. But still, they share a lot! If you know XPages, your experience understanding AngularJS should be similar to mine. I'm basing my experience on  AngularJS 1.x, although it is also applies to 2.0. 2.0 is even closer to Java programmers with the use of TypeScript and new concepts familiar to Java developers (classes, ...). But this is a different topic.

Let me dive into the similarities between the 2 frameworks:

1- The Document Object Model (DOM) as the page content
If it is obvious that Angular works on top of a DOM, but XPages is also working on a DOM. If AngularJS leverages the browser HTML DOM, XPages manages an hierarchy of JSF components. Both are hierarchical, with a parent-children relationship, and represent the content of the page. 

2- Contextual data
Angular calls that a scope, which is data assigned to a DOM element and its children. If this concept does not exist in the JSF spec, XPages provides it through "complex properties" like data sources, or even the "context" object that lets developers add data to a component and its children.

3- Scripts and formulas
Obviously Angular uses JavaScript as its scripting language, similarly to XPages, plus a JavaScript based formula language for expression evaluation. XPages also uses the JSF EL for basic expression

4- Data binding
Angular does data binding either through attributes to the DOM elements, or expressions within curly braces {{...}}. Well, XPages does kind of the same thing with value binding expressions, ${...}or #{...}

5- Directives or components
Angular does great job extending your browser HTML with directives. In short, you can create your own tags (or even attributes one can use on existing tags). XPages does the same with custom controls, adding new tags to the XML describing the page.

6- Managed beans
Angular uses services, JSF has managed beans to externalize the business logic outside of the page. In both cases, you access the object by its name, using dependency injection for Angular, or faces-config.xml for XPages.

Of course, there are real differences. One of them is the processing model, as XPages is running well known phases while Angular uses an event based model, with a queue of events. But still, as an XPages developer, you'll feel at home pretty quickly when writing an application.

How about Angular 2.0? Well we are currently experimenting with it and I'll share my findings in a little while.