Friday, February 29, 2008

Automatic Fetch Groups

Seen this article about AutoFetch on TheServerSide.

It is nice to see some some people start to understand limitations of well-known ORM solutions in terms of fetching. Xcalia has a nice solution for this since years, through the notion of "business use-cases".

Through this notion, you can associate runtime behaviours (including fetch groups) to sections of your code. Each section will be called according to the current context definition.

Here is a pseudo-code:
UseCase.start("MinimalEmployee");
Collection employees = getEmployees(pm, lastName);
UseCase.end("MinimalEmployee");


UseCase.start("CompleteEmployee");
employees = getEmployees(pm, lastName);
UseCase.end("CompleteEmployee");

In the first case, the getEmployees() method will just retrieve the name of employees, while in the second case it will retrieve all employee information, including related objects and collections.

Use cases can be defined statically in properties files, but they can also be modified at runtime through the JMX interface. This mechanism is not limited to queries and can also be applied during transparent navigation between objets.

It seems AutoFetch author claims to be able to automates the definition of dynamic fetch plans. The details should be available in this paper.

Automatic Fetch Groups

Seen this article about AutoFetch on TheServerSide.

It is nice to see some some people start to understand limitations of well-known ORM solutions in terms of fetching. Xcalia has a nice solution for this since years, through the notion of "business use-cases".

Through this notion, you can associate runtime behaviours (including fetch groups) to sections of your code. Each section will be called according to the current context definition.

Here is a pseudo-code:
UseCase.start("MinimalEmployee");
Collection employees = getEmployees(pm, lastName);
UseCase.end("MinimalEmployee");


UseCase.start("CompleteEmployee");
employees = getEmployees(pm, lastName);
UseCase.end("CompleteEmployee");

In the first case, the getEmployees() method will just retrieve the name of employees, while in the second case it will retrieve all employee information, including related objects and collections.

Use cases can be defined statically in properties files, but they can also be modified at runtime through the JMX interface. This mechanism is not limited to queries and can also be applied during transparent navigation between objets.

It seems AutoFetch author claims to be able to automates the definition of dynamic fetch plans. The details should be available in this paper.

Thursday, February 28, 2008

JPA for SimpleDB

I'm glad to see people start to understand the benefits of expanding mapping technologies to non-relational data sources: seen on TSS, Java Persistence API (JPA) implementation for Amazon SimpleDB.

Xcalia has been a pioneer in this area with Castor (an ORM-OXM open source product). Then later, Toplink announced support for XML and JCA data sources, while JPOX now supports db4o.

This industry is going in the right direction. That said the real challenge is when you want to address service-oriented data sources. In this area Xcalia is still ahead of the competition.

JPA for SimpleDB

I'm glad to see people start to understand the benefits of expanding mapping technologies to non-relational data sources: seen on TSS, Java Persistence API (JPA) implementation for Amazon SimpleDB.

Xcalia has been a pioneer in this area with Castor (an ORM-OXM open source product). Then later, Toplink announced support for XML and JCA data sources, while JPOX now supports db4o.

This industry is going in the right direction. That said the real challenge is when you want to address service-oriented data sources. In this area Xcalia is still ahead of the competition.

Friday, February 22, 2008

ODBMS Conference

March 13th-14th in Berlin: ICOODB 2008.

I would love to see how ODBMS are changing, but I'll be in the US that week.

ODBMS Conference

March 13th-14th in Berlin: ICOODB 2008.

I would love to see how ODBMS are changing, but I'll be in the US that week.

The end of RDBMS?

In this article in French, Olivier Rafal (Le Monde Informatique) mentions a study from M. Stonebraker (father of Ingres and Postgres) in which he introduces a new kind of databases, based on column storage versus row storage. This technology is already used on some dedicated OLAP database engines. According to the study this new technology could be more efficient the than relational model. Ultimately, M. Stonebroaker wants to say there is a need for different DBMS technologies, all projects do not fit with RDBMS. Here is a link to his blog: the Database Column.

Few comments about this:

  • The relational model is not known for its efficiency. It is flexible, it s convenient, is it proven, but it is not efficient.

  • It seems the database market is more open these days, with new ideas coming from actors like db4o, Amazon, Google, for example.

    • NB: it is not sure they will be able to move the elephants (DB2, SQLSERVER, ORACLE, MYSQL).



  • The real problem today is not the database itself but the data access.

    • SOA and Web 2.0 are generated new requirements.

    • Standards like JDO, JPA and SDO have established the notion of abstracting the data source.



  • The debate about DBMS and data access is always religious and hot. It is all about passion!


That said, it won't be a big deal, from a purely technical perspective, to improve the efficiency of relational engines.

First of all, we would need to add an automatic universal identifier to each row in tables. This UID already exists internally, but it is not used. For instance, each row in an Oracle table as a ROWID, but this information contains additional information changing in time. What is required is a unique, stable ID for each row in any table in a set of related database instances. This UID must be a logical ID, not a physical one. This is different from auto-incremented columns which deliver unique ID only within a single table. This new UID provides several benefits:

  • Automatic, transparent PK, no need to manually add PK definition to each table.

  • No way to let DBA or database designers creating PK based on business values (which is a strong design mistake).

  • Good support for relationships between entities.


NB: the notion of LOID (logical OID) within the Versant ODBMS could serve a good example of this UID.

Second, we need to enrich the notion of relationship between entities. In current relational models, this is done through foreign keys, which is not efficient. This could rely on the new UID. The idea is that instead of defining a column DEPT_ID of type String, with an associated FK on the DEPT table PK, why not directly creating a DEPT_ID column of type DEPARTMENT? Not only it better represents the real world, it simplifies alignment with object-programming languages and it is much more efficient to retrieve data (as we can navigate from the DEPT_ID to the DEPT without any SQL query). We could also enrich this definition in order to distinguish between 1-1, 1-N and N-1 relationships which is poorly managed today in most RDBMS engines.

These new features could perfectly be added in current engines without impacting existing PK and FK, so that current data models can coexist with new ones.

The end of RDBMS?

In this article in French, Olivier Rafal (Le Monde Informatique) mentions a study from M. Stonebraker (father of Ingres and Postgres) in which he introduces a new kind of databases, based on column storage versus row storage. This technology is already used on some dedicated OLAP database engines. According to the study this new technology could be more efficient the than relational model. Ultimately, M. Stonebroaker wants to say there is a need for different DBMS technologies, all projects do not fit with RDBMS. Here is a link to his blog: the Database Column.

Few comments about this:

  • The relational model is not known for its efficiency. It is flexible, it s convenient, is it proven, but it is not efficient.

  • It seems the database market is more open these days, with new ideas coming from actors like db4o, Amazon, Google, for example.

    • NB: it is not sure they will be able to move the elephants (DB2, SQLSERVER, ORACLE, MYSQL).



  • The real problem today is not the database itself but the data access.

    • SOA and Web 2.0 are generated new requirements.

    • Standards like JDO, JPA and SDO have established the notion of abstracting the data source.



  • The debate about DBMS and data access is always religious and hot. It is all about passion!


That said, it won't be a big deal, from a purely technical perspective, to improve the efficiency of relational engines.

First of all, we would need to add an automatic universal identifier to each row in tables. This UID already exists internally, but it is not used. For instance, each row in an Oracle table as a ROWID, but this information contains additional information changing in time. What is required is a unique, stable ID for each row in any table in a set of related database instances. This UID must be a logical ID, not a physical one. This is different from auto-incremented columns which deliver unique ID only within a single table. This new UID provides several benefits:

  • Automatic, transparent PK, no need to manually add PK definition to each table.

  • No way to let DBA or database designers creating PK based on business values (which is a strong design mistake).

  • Good support for relationships between entities.


NB: the notion of LOID (logical OID) within the Versant ODBMS could serve a good example of this UID.

Second, we need to enrich the notion of relationship between entities. In current relational models, this is done through foreign keys, which is not efficient. This could rely on the new UID. The idea is that instead of defining a column DEPT_ID of type String, with an associated FK on the DEPT table PK, why not directly creating a DEPT_ID column of type DEPARTMENT? Not only it better represents the real world, it simplifies alignment with object-programming languages and it is much more efficient to retrieve data (as we can navigate from the DEPT_ID to the DEPT without any SQL query). We could also enrich this definition in order to distinguish between 1-1, 1-N and N-1 relationships which is poorly managed today in most RDBMS engines.

These new features could perfectly be added in current engines without impacting existing PK and FK, so that current data models can coexist with new ones.

Thursday, February 21, 2008

Is Hibernate the best choice?

Two recent threads about this question:

To me, the question of which ORM is the best is outdated. The problem with ORM is the 'O' and the 'R'.

A modern mapping solution, cannot be limited to RDBMS. It has to support XML files and streams, Web Services, mainframe transactions and other legacy or exotic data sources as well.

The same way, it cannot be limited to POJOs. It has to support business languages and BPM like workflow engines, SCA, JBI, BPEL, rules engines. These are not low-level programming languages like Java or C# and they can just rely on Web services to manipulate data coming from heterogeneous data sources (within the enterprise or avaliable on the Web). That's why the question of Data Services is becoming hot, and ORM is just one slight subset of it.

When comparing to Hibernate, JPOX is starting to expand in that direction:

  • Support for several standards (JDO and JPA),

  • Support for multiple data sources (RDBMS and db4o).


It is a good news to see they go in the right direction but they don't support SOA related standards like SDO and DAS and they don't support service-oriented data sources (Web services, mainframes transactions, stored procedures, packaged applications...).

Is Hibernate the best choice?

Two recent threads about this question:

To me, the question of which ORM is the best is outdated. The problem with ORM is the 'O' and the 'R'.

A modern mapping solution, cannot be limited to RDBMS. It has to support XML files and streams, Web Services, mainframe transactions and other legacy or exotic data sources as well.

The same way, it cannot be limited to POJOs. It has to support business languages and BPM like workflow engines, SCA, JBI, BPEL, rules engines. These are not low-level programming languages like Java or C# and they can just rely on Web services to manipulate data coming from heterogeneous data sources (within the enterprise or avaliable on the Web). That's why the question of Data Services is becoming hot, and ORM is just one slight subset of it.

When comparing to Hibernate, JPOX is starting to expand in that direction:

  • Support for several standards (JDO and JPA),

  • Support for multiple data sources (RDBMS and db4o).


It is a good news to see they go in the right direction but they don't support SOA related standards like SDO and DAS and they don't support service-oriented data sources (Web services, mainframes transactions, stored procedures, packaged applications...).

Tuesday, February 19, 2008

TSS Symposium persistence sessions

Persistence: Checkpoints


With Track Host: Patrick Linskey, co-leader, EJB3 and JDO spec teams

Managing data is not considered to be "sexy," but it is essential. Our Checkpoints track looks at common ORM problems and solutions, as well as alternatives to ORM and persistence strategies. Browse all persistence sessions or select a title from below:

Maybe persistence is not "sexy", but it is "hot". What are the sexiest technologies in IT?

Detailed page...

TSS Symposium persistence sessions

Persistence: Checkpoints


With Track Host: Patrick Linskey, co-leader, EJB3 and JDO spec teams

Managing data is not considered to be "sexy," but it is essential. Our Checkpoints track looks at common ORM problems and solutions, as well as alternatives to ORM and persistence strategies. Browse all persistence sessions or select a title from below:

Maybe persistence is not "sexy", but it is "hot". What are the sexiest technologies in IT?

Detailed page...

Monday, February 18, 2008

GORM tutorial

A new tutorial about GORM (Groovy ORM), seen on IBM's developerWorks.

I'm repeating myself (see my previous posts) but I really think ORM should be available at the Groovy level not only at the Grails one. It would also have been much more interesting to interface with JPA instead of Hibernate.

I'm a little bit doubtful about the mapping constrainst stuff. Even if I like Groovy and NakedObjects, I'm not sure that cluttering the source code of business classes is something you really want on real large projects. Some will argue that having everything into the code (annotations...) is more readable. This can work on trivial demos using topdown approach, with simple mapping options and a single RDBMS as the data source. It won't work on a more complex project because classes will quickly become unreadable. Having everything in a single file is only convenient when everything is designed and managed by a single developer. When there are specialized roles within the development teams, one will focus on the business logic, another one on the mapping and so on. And data access rules can be stored and retrieved for a repository.

GORM tutorial

A new tutorial about GORM (Groovy ORM), seen on IBM's developerWorks.

I'm repeating myself (see my previous posts) but I really think ORM should be available at the Groovy level not only at the Grails one. It would also have been much more interesting to interface with JPA instead of Hibernate.

I'm a little bit doubtful about the mapping constrainst stuff. Even if I like Groovy and NakedObjects, I'm not sure that cluttering the source code of business classes is something you really want on real large projects. Some will argue that having everything into the code (annotations...) is more readable. This can work on trivial demos using topdown approach, with simple mapping options and a single RDBMS as the data source. It won't work on a more complex project because classes will quickly become unreadable. Having everything in a single file is only convenient when everything is designed and managed by a single developer. When there are specialized roles within the development teams, one will focus on the business logic, another one on the mapping and so on. And data access rules can be stored and retrieved for a repository.

Monday, February 11, 2008

Database as a Service

We'll probably see in the coming months more and more acronyms ending with "aaS" which stands for "as a Service".

This article from InfoQ mentions Longjump's Database as a Service offer (DaaS, not our DAS). They just host and administrate a MySQL database for you, and they offer REST or SOAP APIs to manipulate it. As the article states, the future of such offers is not clear.

What will be the next XaaS?

Database as a Service

We'll probably see in the coming months more and more acronyms ending with "aaS" which stands for "as a Service".

This article from InfoQ mentions Longjump's Database as a Service offer (DaaS, not our DAS). They just host and administrate a MySQL database for you, and they offer REST or SOAP APIs to manipulate it. As the article states, the future of such offers is not clear.

What will be the next XaaS?

Tuesday, February 5, 2008

ODBMS not dead

TheServerSide mentions here that ODMG3 will now be hosted at www.odbms.org.

It seems that object databases are gaining some traction on the market, probably due to the good job done by db4o. Versant (my former employer) is also doing not too bad, at least from the financial point of view. I'll soon have some news about ObjectStore.

ODBMS have been a failure in the past. My personal view on it:

  • Some technical reasons:

    • Some ODBMS were not designed as real database engines, they were rather like storages for memory pages. This design did a lot against ODBMS in general, and most developers still think ODBMS cannot scale, just by design. Fortunately, some ODBMSare not designed like that and are real database engines, exactly like Oracle for instance.

    • Lack of known QL like SQL. ODMG3's OQL has never really been seriously implemented by ODBMS vendors (but O2 maybe).

    • Lack of ecology around ODBMS. For instance, major reporting tools never spent time accessing ODBMS.



  • Some economical reasons

    • ODBMS started in 1989, at the exact time when RDBMS tried to impose themselves on the database market. It was not the rigt time to propose a new database model and relational vendors have been very aggressive against ODBMS (I think the word FUD has been invented at that time).

    • At that time, object programming was not generally used, unless for some technical applications. Database market is traditionally led by management and business applications.

    • Most ODBMS vendors raised money from the market after IPOs around 1995. At that time the IT industry was between the Client-Server and Internet phases. Nothing really happened at that time. Raised money has been spent on useless marketing plans and most ODBMS vendors just missed the Internet bubble in 1997, with no more money.




So, most ODBMS tried to reinvent themselves either as caching technologies (Versant, Gemstone, etc.) or as XML databases (ObjectStore).

I think that because of Java, C# and now Ruby and Groovy the world has changed and might be ready for a new database model. Mapping technologies are now standardized. Some companies like Xcalia already evangelizes around mapping between business models and non-relational data sources. Standards for mapping and data access like JPA2, JDO2 and SDO2 tend to commoditize the storage layer. With requirements of huge commercial sites like Amazon, eBay and others, combined with the emergence of services, people tend to better understand limitations of the relational model.

Let's see what will happen in the coming months. ODBMS can have a role to play. But there is a also new kind of databases coming from internet players like Amazon or Google.

I'm still convinced that Data Access will become more critical than the storage itself. Developers want to manipulate their business models as easily as possible. And business models tend to become more and more complex every day. Flexibility and agility are keywords. It is now well admitted that traditional extended features of databases like data integrity, security or stored procedures should now be managed through services, outside the database. This could be an opportunity for best well-designed ODBMS, with a reliable and fast engine.

ODBMS not dead

TheServerSide mentions here that ODMG3 will now be hosted at www.odbms.org.

It seems that object databases are gaining some traction on the market, probably due to the good job done by db4o. Versant (my former employer) is also doing not too bad, at least from the financial point of view. I'll soon have some news about ObjectStore.

ODBMS have been a failure in the past. My personal view on it:

  • Some technical reasons:

    • Some ODBMS were not designed as real database engines, they were rather like storages for memory pages. This design did a lot against ODBMS in general, and most developers still think ODBMS cannot scale, just by design. Fortunately, some ODBMSare not designed like that and are real database engines, exactly like Oracle for instance.

    • Lack of known QL like SQL. ODMG3's OQL has never really been seriously implemented by ODBMS vendors (but O2 maybe).

    • Lack of ecology around ODBMS. For instance, major reporting tools never spent time accessing ODBMS.



  • Some economical reasons

    • ODBMS started in 1989, at the exact time when RDBMS tried to impose themselves on the database market. It was not the rigt time to propose a new database model and relational vendors have been very aggressive against ODBMS (I think the word FUD has been invented at that time).

    • At that time, object programming was not generally used, unless for some technical applications. Database market is traditionally led by management and business applications.

    • Most ODBMS vendors raised money from the market after IPOs around 1995. At that time the IT industry was between the Client-Server and Internet phases. Nothing really happened at that time. Raised money has been spent on useless marketing plans and most ODBMS vendors just missed the Internet bubble in 1997, with no more money.




So, most ODBMS tried to reinvent themselves either as caching technologies (Versant, Gemstone, etc.) or as XML databases (ObjectStore).

I think that because of Java, C# and now Ruby and Groovy the world has changed and might be ready for a new database model. Mapping technologies are now standardized. Some companies like Xcalia already evangelizes around mapping between business models and non-relational data sources. Standards for mapping and data access like JPA2, JDO2 and SDO2 tend to commoditize the storage layer. With requirements of huge commercial sites like Amazon, eBay and others, combined with the emergence of services, people tend to better understand limitations of the relational model.

Let's see what will happen in the coming months. ODBMS can have a role to play. But there is a also new kind of databases coming from internet players like Amazon or Google.

I'm still convinced that Data Access will become more critical than the storage itself. Developers want to manipulate their business models as easily as possible. And business models tend to become more and more complex every day. Flexibility and agility are keywords. It is now well admitted that traditional extended features of databases like data integrity, security or stored procedures should now be managed through services, outside the database. This could be an opportunity for best well-designed ODBMS, with a reliable and fast engine.