Friday, February 22, 2008

The end of RDBMS?

In this article in French, Olivier Rafal (Le Monde Informatique) mentions a study from M. Stonebraker (father of Ingres and Postgres) in which he introduces a new kind of databases, based on column storage versus row storage. This technology is already used on some dedicated OLAP database engines. According to the study this new technology could be more efficient the than relational model. Ultimately, M. Stonebroaker wants to say there is a need for different DBMS technologies, all projects do not fit with RDBMS. Here is a link to his blog: the Database Column.

Few comments about this:

  • The relational model is not known for its efficiency. It is flexible, it s convenient, is it proven, but it is not efficient.

  • It seems the database market is more open these days, with new ideas coming from actors like db4o, Amazon, Google, for example.

    • NB: it is not sure they will be able to move the elephants (DB2, SQLSERVER, ORACLE, MYSQL).

  • The real problem today is not the database itself but the data access.

    • SOA and Web 2.0 are generated new requirements.

    • Standards like JDO, JPA and SDO have established the notion of abstracting the data source.

  • The debate about DBMS and data access is always religious and hot. It is all about passion!

That said, it won't be a big deal, from a purely technical perspective, to improve the efficiency of relational engines.

First of all, we would need to add an automatic universal identifier to each row in tables. This UID already exists internally, but it is not used. For instance, each row in an Oracle table as a ROWID, but this information contains additional information changing in time. What is required is a unique, stable ID for each row in any table in a set of related database instances. This UID must be a logical ID, not a physical one. This is different from auto-incremented columns which deliver unique ID only within a single table. This new UID provides several benefits:

  • Automatic, transparent PK, no need to manually add PK definition to each table.

  • No way to let DBA or database designers creating PK based on business values (which is a strong design mistake).

  • Good support for relationships between entities.

NB: the notion of LOID (logical OID) within the Versant ODBMS could serve a good example of this UID.

Second, we need to enrich the notion of relationship between entities. In current relational models, this is done through foreign keys, which is not efficient. This could rely on the new UID. The idea is that instead of defining a column DEPT_ID of type String, with an associated FK on the DEPT table PK, why not directly creating a DEPT_ID column of type DEPARTMENT? Not only it better represents the real world, it simplifies alignment with object-programming languages and it is much more efficient to retrieve data (as we can navigate from the DEPT_ID to the DEPT without any SQL query). We could also enrich this definition in order to distinguish between 1-1, 1-N and N-1 relationships which is poorly managed today in most RDBMS engines.

These new features could perfectly be added in current engines without impacting existing PK and FK, so that current data models can coexist with new ones.

No comments:

Post a Comment