Data Services by Erix.: 2008

Monday, December 15, 2008

EMC Atmos

Let's try to be more positive about new data sources for the cloud, after the harsh comment of a reader seen in my previous post. Last week, I had the opportunity to discover Atmos, a new Cloud Optimized Storage (COS) from EMC.

What I particularly liked is the fact it is based on an object-oriented distributed file system. Microsoft wanted to build such a file system few years ago and eventually they gave up. EMC did it. I still need to dig into the details of that product, but I really think the possibilities of a true OOFS are almost impossible to imagine right now.

EMC Atmos

Performance of new databases

Interesting performance comparison from Oakleaf. They compare an Amazon EC2 application deployed with different storage options (SQL Server, SimpleDB and Azure Table Service).

I reproduce the result table, if you have no time to read the article:

A reader's comment:

A major piece of work by OakLeaf … has confirmed my suspicions about how inappropriate all the cloud name/value entity store technologies are for serious SaaS developers.
The Google AppEngine Datastore, Amazon’s SimpleDB and Windows Azure have chronic performance problems relative to conventional database throughput. Ultimately, the inherent inefficiencies of these storage options will hit hourly cloud renters in the pocket.

Performance of new databases

A reader's comment:

Semantic SOA

Seen this post on InfoQ, OASIS has released a new version of their Reference Model for SOA.

It is well admitted that if we want to have SOA more policy-driven we must stop to manually compose services together. And this can mostly be achieved through extended metadata (I mean more than WSDL). WSDL was a nice beginning but:

Is is limited to Web services, and not all services are Web services. For instance some existing mainframes applications, transactions, green screens, APIs of packaged applications and Stateless Session EJBs are also services.
Relationships between entities are missing.
There is nothing about behavior of services.

The whole Semantic Web is an ambitious set of projects. However, when it comes to data access in SOA, we have to deal with the same kind of issues. Accessing a data services layer is more complicated than accessing a databaase, because it publishes a business interface instead of a technical interface. Therefore, if we want to dynamically compose data services together, as opposed to hard-code the combinations, we need some kind of semantic metadata about the data manipulation behavior.

That dynamic aspect is fundamental for modern Data Services Platforms and is quite often missing in first generation technologies.

The first step of SOA was all about designing, deploying and consuming services. The second step is now all about dynamically designing, dynamically deploying and dynamically consuming services. We cannot spend all our time in manually connecting thousands of services together.

Semantic SOA

Is is limited to Web services, and not all services are Web services. For instance some existing mainframes applications, transactions, green screens, APIs of packaged applications and Stateless Session EJBs are also services.
Relationships between entities are missing.
There is nothing about behavior of services.

That dynamic aspect is fundamental for modern Data Services Platforms and is quite often missing in first generation technologies.

The Information Perspective of SOA Design

Seen this post on InfoQ. I like to see that most people agree on the need to reintroduce the data aspect into SOA. You should read the full article from IBM, but they basically insist on 3 points:

Define data semantics.
Canonical modeling.
Data quality.

The 2 first points are at the heart of any Data Services Platform.

Defining Data Semantics is very important and should be done with extended metadata. The richer the metadata are, the further we can decouple data sources form data consumers. And this is a strong requirement if you want to be more "policy-driven" and less hard-coded in your data access strategies.

Canonical modeling is sometimes seen as a burden by database purists. However, it is required as soon as you need to federate heterogeneous data sources, that is the common case in SOA. On top of that, canonical models offer a more business-friendly view of data. The question remain about the scope of canonical modeling. Very large business models, at the enterprise level, are known to be very difficult to design and to maintain. DSP must offer a way to select the best granularity for canonical models. One model for all applications is a myth (at least today), but conversely, one model per application does not allow reaching the promises of reuse of SOA. On this aspect it is also interesting to track what vertical standardization efforts (SID, HL7, Acord, SWIFT, etc.) could bring on the table. We now see users starting to use the Telco SID model outside of the Telco market, for instance (from a 40,000 ft perspective, a customer is a customer).

The third one, is a market by itself (MDM) but should also be accessible from within a Data Services Platform. Relationhips between MDM and DSP technologies are multiple. DSP can be used as a synchronization layer for MDM products. DSP can support reference data, as a new kind of data sources, capturing their specific meaning (reference data can be a link to the real data, of maintain a link with it). DSP could use data cleaning services to improve data quality. These are just examples.

All in all, this is all going in the right direction.

The Information Perspective of SOA Design

Seen this post on InfoQ. I like to see that most people agree on the need to reintroduce the data aspect into SOA. You should read the full article from IBM, but they basically insist on 3 points:

Define data semantics.
Canonical modeling.
Data quality.

The 2 first points are at the heart of any Data Services Platform.

All in all, this is all going in the right direction.

Monday, December 8, 2008

Criticism of Java persistence

The recent Criticsm of Java Persistence thread on TheServerSide shows us once again that data access is still a very sensitive, emotional and almost religious topic.
I have said what I had to say in the thread itself, so no need to duplicate this here.

Many people have wrong ideas about persistence in general, and many people have their own views about how to manage it (while conversely very few people are publicly defending their opinion about how to manage pages in an operating system for instance). There is definitely something special about data access and persistence.

NB: I really think ODBMS vendors should stop communicating on the "the best mapping is no mapping" moto. First it is not true, and second it does not help them.

Criticism of Java persistence

Friday, December 5, 2008

ODMG's not dead?

Seems OMG will host an Object Database Standard Definition Scope meeting in Santa Clara, next week.

ODMG's not dead?

Seems OMG will host an Object Database Standard Definition Scope meeting in Santa Clara, next week.

Versant acquired db4o yesterday

See the news from the Versant site and from the db4o blog.

Well, that the Enterprise ODBMS buying the embedded ODBMS. Don't know what it means to the ODBMS community. Does it means that eventually the db4o business model didn't work as expected despite the good image of the product and company on the market? I also don't clearly see what it means to the Poet's FastObject part of Versant...

I know both products quite well, and I know both teams share some common genuine values around quality and performance. Even if I now work for a company who owns ObjectStore, I wanted to send a sincere "good luck" to them.

Versant acquired db4o yesterday

Tuesday, December 2, 2008

LINK like initiatives for Java

LiquidForm: http://code.google.com/p/liquidform/
JaQu from the H2 database.
db4o is working on some kind of LINQ equivalent I think.
JEQUEL.
Quaere.
LINQ for Java (Google Group).
Query DSL.

LINK like initiatives for Java

LiquidForm: http://code.google.com/p/liquidform/
JaQu from the H2 database.
db4o is working on some kind of LINQ equivalent I think.
JEQUEL.
Quaere.
LINQ for Java (Google Group).
Query DSL.

Wednesday, November 26, 2008

OJM

Object to JDBC Mapping: OJM.
Seems to me when I first saw that new TLA that it was a pure oxymore.

Reading the CPO web site just confirmed this. There are so many strange amd sometimes inaccurate statements on this page. The author seems to ignore what modern persistence is.
I really hate to be harsh like that, but is there really a need in 2008, for yet another JDBC abstraction?

OJM

Different problems, different databases

The big promise of relational databases was to have a unique, single technology for all our data storage needs. The main idea was to separate data from applications manipulating data.
Having data models too much coupled with applications model has indeed been recognized in the 70s as one of the main problems preventing IT flexibility (for instance, because producing new reports for business users required to go back to a full development cycle).

But at the same time, the design of software made significant progress by recommending encapsulating data (state) within methods (behavior), clearly going in an opposite direction. This all created a lot of stress and noises in the software industry, and eventually the emergence of persistence technologies.

But what really means decoupling data from applications? It mostly consists in removing explicit directional relationships from database schemas, so that data views can later be recombined in any way. When you think about it, it just means that relationships where poorly represented in programming languages, and it is still true with most modern languages including Java and C#. But to be honest, relationships are also poorly represented in relational models and this is not the nasty foreign keys that will change anything to it.

The fact is that the next real big revolution in the IT world would be a first comprehensive support for the notion of relationships.

Object database vendors failed to impose ODBMS while it was the most relevant choice for Java, at least from the technical point of view. There are many reasons to that:

Some first ODBMS implentations were very bad in terms of database administration, ad-hoc queries and overall performance. They were not really database but rather a storage mechanism for in-memory object pages.
The "ecology" never really started arounbd ODBMS (reporting tool...).
ODBMS started in the late 80s, exactly when RDBMS were just about to gain mometum on the market, it was not the right time to impose a new database technology.
Then major ODBMS vendors raised money from IPOs in the 95, in a very quiet time with no opportunity for expansion and no real need for money, therefore most of that money has been waste for nothing.
In 1998, the ODBMS vendors missed the Internet wave, mostly because of the XML mania at that time.
Then they surrendered and tried to reposition themselves as cache (Versant, Gemstone) or XML storage (Objectstore).

Hopefully, the XML database market never really emerged, despite the huge XML hype, probably because everybody understands XML is a good exchange format but a very bad (too verbose) storage format. The big problem with XML is still that it tends to impose hierarchical models, which to some extents are a kind of regression in our industry.

You can easily have an object or XML layer on top of any kind of storage, including relational (see IBM pureXML, for instance). Probably the best approach would be to have neutral and efficient storages, with multiple interfaces around them. It could be a kind of relational storage with the notion of relationship efficiently supported.

Internet, SOA, WOA, Web 2.0, mashups, etc. all favor a style where business functionalities become independent, and thus will have their own storage (as they cannot share a common database any longer, as they are really physically distributed).

It now seems some vendors are now trying to push the idea that even this low-level underlying storage layer, should have different foundations, depending on the kind of problems addressed. That's why the vertical model (storage primarily organized by columns instead of rows, like in Vertica) or the key-value model are quickly growing these days.

They will certainly not replace RDBMS systems any soon, as some are already claiming, but they will maybe impose themselves in some situations. Seems to me the time of the omniscient relational model is about to decline, even if it will remain present for decades.

Data services direclty impacts the database world because:

We have more and more data sources to access from a even a simple business application.
The notion of transaction is change.
We more and more frequently have to support asynchronous data access.
It becomes not only possible but also mandatory to access any kind of data sources, not only relational ones.
Databases are progressively commoditized, and their advanced features will move to intermediate mediation layers.
Then it is possible to choose the best database technology for a given need, at any time.

Different problems, different databases

Some first ODBMS implentations were very bad in terms of database administration, ad-hoc queries and overall performance. They were not really database but rather a storage mechanism for in-memory object pages.
The "ecology" never really started arounbd ODBMS (reporting tool...).
ODBMS started in the late 80s, exactly when RDBMS were just about to gain mometum on the market, it was not the right time to impose a new database technology.
Then major ODBMS vendors raised money from IPOs in the 95, in a very quiet time with no opportunity for expansion and no real need for money, therefore most of that money has been waste for nothing.
In 1998, the ODBMS vendors missed the Internet wave, mostly because of the XML mania at that time.
Then they surrendered and tried to reposition themselves as cache (Versant, Gemstone) or XML storage (Objectstore).

Data services direclty impacts the database world because:

We have more and more data sources to access from a even a simple business application.
The notion of transaction is change.
We more and more frequently have to support asynchronous data access.
It becomes not only possible but also mandatory to access any kind of data sources, not only relational ones.
Databases are progressively commoditized, and their advanced features will move to intermediate mediation layers.
Then it is possible to choose the best database technology for a given need, at any time.

New databases

Following my previous blog about the future of databases, I've seen these recent posts on TheServerSide:

Interview on CouchDB, the Apache project for a document database, written in Erlang, with an HTTP/REST/JSON API, already mentioned in this blog. Interesting point of view at the end about BigTable. For sure a big scalable, persistent Map is certainly interesting in some very specific cases, but it is not really a database at the end.
Scalaris, a scalable, transactional data store for Web 2.0 services. Yet another distributed, persistent key-value system. That one shares many design ideas with BigTable, while, like CouchDB, it is written in Erlang. The big addition seems to be a better support for transactions. See OnScale for more information (videos, slides...)

New databases

Following my previous blog about the future of databases, I've seen these recent posts on TheServerSide:

Interview on CouchDB, the Apache project for a document database, written in Erlang, with an HTTP/REST/JSON API, already mentioned in this blog. Interesting point of view at the end about BigTable. For sure a big scalable, persistent Map is certainly interesting in some very specific cases, but it is not really a database at the end.
Scalaris, a scalable, transactional data store for Web 2.0 services. Yet another distributed, persistent key-value system. That one shares many design ideas with BigTable, while, like CouchDB, it is written in Erlang. The big addition seems to be a better support for transactions. See OnScale for more information (videos, slides...)

Tuesday, November 25, 2008

The Future of Databases

Last week in San Jose, during the Data Services World event, I've been participating to several discussions (private journalists briefings, public panels) about the Future of Databases in the Cloud Computing. There are people who really seriously think today that RDBMS will soon disappear because of the Cloud.

I have seen that the same topic has also been discussed at various locations at the same time, including interesting point of view from Martin Fowler.

New technologies for databases can be roughly divided into two groups:

New kind of database technologies, tuned by design for the Cloud like Vertica, CouchDB, SimpleDB and other products alike that I've already mentioned in that blog.
New deployment and access for RDBMS, known as Database-as-a-Service. Basically, the database is remotely hosted and administrate, but you still access it through SQL over HTTP or SOAP/REST.

Having worked in the past for an ODBMS vendor I know how difficult it is to convince CIOs, project managers, architects, developers and DBAs to move from RDBMS. There is a kind of religion about relational theory. My take is that RDBMS are here to stay, as mainframes did (they never disappeared as it has been predicted by many "experts" in the past). New technologies never replace good old ones, they just complement them.

Anyway, there are tangible impacts of SOA and the Cloud on the database market:

We will access more kind of data sources in the future, not only RDBMS and services, but also new kind of databases. Heterogeneity will continue to grow.
We will access more data sources in the future, most applications were using single databases, they will now access multiple data sources. We are switching from data access to data integration (I tend to prefer the term adaptive mediation of information). Integration has to be done at a business level, not at the SQL one or XML one.
Many advanced features of databases engines (security, fault tolerance, stored procedures...) will progressively move to an intermediate integration layer. Databases, including relational databases, will go back to simple and efficient storage technology.
Data integration will become more important than the database itself, databases will be commoditized. Each application development team will be able to select the best database technology for its needs.
Accessing non-database data sources will impose to have extended metadata. The relational world is simple because SQL provides a convenient, technical APIs to access data at the atomic level (a cell at the cross between a row and a column). Everything is implicit, in terms of metadata, access patterns, etc. Conversely, accessing a service-oriented data source imposes to explicitly describe its data model and its data manipulation semantic. Services can be either fined-grained or coarse-grained, you need to capture that. Data access has its contribution to the Semantic Web.
When thousands of data sources will be available as data services (like mainframe screens, APIs of packaged applications), we will need tools to automatically combine them at runtime. Manual, hard-coded or even visual composition of data services is a choice only when dealing with a few data services. Dynamic composition of data services (e.g. aggregation of fine-grained data services into other larger coarse-grained data services as required by ever changing business functionalities) is imposed by the really agile IT. Otherwise "agile" will turn into "fragile"!
Ad-hoc data mashups will require availability of the right data services at the right time. This can only be achieved by platforms being able to dynamically create and publish new data services as they become required.
Access to non-structured data will grow. At the same time, non-structured data is on the way to structure itself or at least to describe itself better, see the "Linked Data", "OpenCalais" and the "Web of Data" efforts for instance.
Accessing multiple data sources with different latencies will impose to deal with reactive data integration patterns. We will have to support asynchronous data access, and we will need tools for that, because asynchronous and parallel programming are not natural to most developers and architects.

As Martin Fowler concludes, Data services platforms are enabling the promises of SOA, by really favoring small business functionalities having their own storage, instead of sharing data in huge centralized databases.

The Future of Databases

New kind of database technologies, tuned by design for the Cloud like Vertica, CouchDB, SimpleDB and other products alike that I've already mentioned in that blog.
New deployment and access for RDBMS, known as Database-as-a-Service. Basically, the database is remotely hosted and administrate, but you still access it through SQL over HTTP or SOAP/REST.

Anyway, there are tangible impacts of SOA and the Cloud on the database market:

We will access more kind of data sources in the future, not only RDBMS and services, but also new kind of databases. Heterogeneity will continue to grow.
We will access more data sources in the future, most applications were using single databases, they will now access multiple data sources. We are switching from data access to data integration (I tend to prefer the term adaptive mediation of information). Integration has to be done at a business level, not at the SQL one or XML one.
Many advanced features of databases engines (security, fault tolerance, stored procedures...) will progressively move to an intermediate integration layer. Databases, including relational databases, will go back to simple and efficient storage technology.
Data integration will become more important than the database itself, databases will be commoditized. Each application development team will be able to select the best database technology for its needs.
Accessing non-database data sources will impose to have extended metadata. The relational world is simple because SQL provides a convenient, technical APIs to access data at the atomic level (a cell at the cross between a row and a column). Everything is implicit, in terms of metadata, access patterns, etc. Conversely, accessing a service-oriented data source imposes to explicitly describe its data model and its data manipulation semantic. Services can be either fined-grained or coarse-grained, you need to capture that. Data access has its contribution to the Semantic Web.
When thousands of data sources will be available as data services (like mainframe screens, APIs of packaged applications), we will need tools to automatically combine them at runtime. Manual, hard-coded or even visual composition of data services is a choice only when dealing with a few data services. Dynamic composition of data services (e.g. aggregation of fine-grained data services into other larger coarse-grained data services as required by ever changing business functionalities) is imposed by the really agile IT. Otherwise "agile" will turn into "fragile"!
Ad-hoc data mashups will require availability of the right data services at the right time. This can only be achieved by platforms being able to dynamically create and publish new data services as they become required.
Access to non-structured data will grow. At the same time, non-structured data is on the way to structure itself or at least to describe itself better, see the "Linked Data", "OpenCalais" and the "Web of Data" efforts for instance.
Accessing multiple data sources with different latencies will impose to deal with reactive data integration patterns. We will have to support asynchronous data access, and we will need tools for that, because asynchronous and parallel programming are not natural to most developers and architects.

Monday, November 10, 2008

Business Objects Data Services

I've recently read several white papers about the new Data Services offer from Business Objects.

Seems to me, it is basically a renaming of their former ETL and Data Quality products.

It is not fundamentally surprizing to see ETL vendors moving to Data Services, as EII vendors did before them.
Let’s say that globally an ETL is a tool to move data from DB1 to DB2, or more exactly extract data from DB1, transform data somewhere (huge debate here) and then load data into DB2.
Now, let’s suppose you replace the third step by “publishing data”, you then have an "ETP" or even a Data Services Platform, if you publish resulting views as Web Services.

Well that's probably still targetting read-only, non real-time data integration, but at least it demonstrates that Data Services are gaining momemtum on the market.

Business Objects Data Services

Saturday, November 8, 2008

SOA social

I found the article mentioned in the previous post on this portal -> SOA Social.
You'll find interesting resources over there.

SOA social

I found the article mentioned in the previous post on this portal -> SOA Social.
You'll find interesting resources over there.

The case for coordinated EDM and SOA

Article by Keith Worfolk in SOA World about the benefits of coordinated strategies for Enterprise Data Management and SOA.

Needless to say that a Data Services Platform should be the beating heart of these coordinated strategies.

I cannot agree more with the first best practice described by the author:
"...When thinking about services, don't forget to consider the data.
Systematically designing a service model is like designing a data model. For either, its impact should be considered long term, and the level of normalization of designed components, services, or data is considered a sign of quality and maturity.

Figure 6 shows service-data normalization from immature to mature organizations:

"Wild West": Non-existent or ad hoc and uncoordinated normalization
Ownership/Stewardship: Service designs built on data designs
Encapsulation: Service and data designs coordinated in development/maintenance initiatives; either may drive the other as long as they are coordinated
Object: One and the same service/data designs. Normalized designs are within EIA designs; service implementations take data ownership to another level where master data value is known only in service designs/implementations.

Most organizations pursuing services-data normalization have progressed to ownership/stewardship levels, yet need to reach encapsulation before realizing major benefits in efficiencies, maintenance costs, and asset business value.

The highest level of service-data normalization, object, may not make sense for some organizations, especially where master data or business services change frequently. Depending on their stability, the more possible an object level may be. However, cost/benefit analysis may make encapsulation preferred for some organizations.

Transitioning to advanced service-data normalization is a process of increasing organizational maturity toward coordinated EDM-SOA strategies..."

The case for coordinated EDM and SOA

"Wild West": Non-existent or ad hoc and uncoordinated normalization
Ownership/Stewardship: Service designs built on data designs
Encapsulation: Service and data designs coordinated in development/maintenance initiatives; either may drive the other as long as they are coordinated
Object: One and the same service/data designs. Normalized designs are within EIA designs; service implementations take data ownership to another level where master data value is known only in service designs/implementations.

More on LINQ to SQL

Some additional comments about the possible end of LINQ to SQL in Julia Lerman's blog.

More on LINQ to SQL

Some additional comments about the possible end of LINQ to SQL in Julia Lerman's blog.

Friday, November 7, 2008

Data Mashups: Enabling Ad-Hoc Composite, Headless, Information Services

ZapThink just released a research paper about Data Mashups.
Extending the notion of mashups, data mashups will decouple data integration from heavy development cycles. But as ZapThink's Ron Schmelzer wrote this requires to have a strong Data Services Layer in place.

I fully agree with the following statements from the research:
"...the IT organization must give Service consumers the tools and methods they need to be able to successfully compose those Services with low cost and risk..."

And this exactly why Dynamic Data Services are so important. Having statically defined, hard-coded (or visually composed) data services could meet the requirements of statically defined service-oriented processes, but the reactive enterprise needs more flexibility. It is important to have the relevant data services available in real-time when data mashups will require ad-hoc data. A good Data Services Platform must support this kind of runtime generation and deployment of ad-hoc data services.

Later the author writes:
"...One of the important benefits of a Data Services layer is that it enables loose coupling between the applications using the Data Services and the underlying data source providers. Loose coupling enables data architects to modify, combine, relocate, or even remove underlying data sources from the Data Services layer without requiring changes to the interfaces that the Data Services expose. As a result, IT can retain control over the structure of data while providing relevant information to the applications that need it. Over time, this increased flexibility eases the maintenance of enterprise applications..."

In a world where most data sources will become service-oriented (even the databases themselves), it is important to be able to really achieve the decoupling between the data services and the data sources. In this particular case, this requires extended semantic metadata around data services, so that an advanced Data Services Platforms can dynamically recompose them at runtime, as requested by new data mashups.

Data Mashups: Enabling Ad-Hoc Composite, Headless, Information Services

Exalead CloudView

Exalead repositions its offer towards unstructured data integration in SOA and Cloud environments.
http://www.exalead.com/software/news/press-releases/2008/09-24.php
http://www.exalead.com/software/products/cloudview/

Exalead CloudView

Steve Mills (IBM) on Information on Demand

It is always interesting to hear what Steve Mills (VP of IBM Software Group) has to say about data in general and his comments about IBM's strategy regarding Information on Demand.

Here ->
http://searchdatamanagement.techtarget.com/generic/0,295582,sid91_gci1337742,00.html

Always good to repeat that Information is data alive, data with a meaning and business value. In an object-oriented world one would say Information is the State part of an object.

Steve Mills (IBM) on Information on Demand

Thursday, November 6, 2008

Entity Framework Futures

http://mschnlnine.vo.llnwd.net/d1/pdc08/WMV-HQ/TL20.wmv, by Tim Mallalieu at PDC 2008.

Interesting and entertaining at the same time.

Abstract
The next version of the Entity Framework adds scenarios in the areas of model driven development, domain driven development, simplicity, and integration. See a preview of production and prototype code for the next version of the Entity Framework as well as a candid discussion with members of the development team.

Entity Framework Futures

Wednesday, November 5, 2008

Windows Azure

The new service infrastructure from Microsoft to compete against other Cloud and SaaS offers has been recently announced.

See for instance:

and millions of other articles, news and blog entries related to this product launch.

Let's see how data access will be addressed in this upcoming offer... First answers in Pablo Castro's blog: http://blogs.msdn.com/pablo/archive/2008/11/01/ado-net-data-services-in-windows-azure-pushing-scalability-to-the-next-level.aspx and http://blogs.msdn.com/pablo/archive/2008/10/28/now-you-know-it-s-windows-azure.aspx

Windows Azure

The new service infrastructure from Microsoft to compete against other Cloud and SaaS offers has been recently announced.

See for instance:

Developing applications with Data Services

Interesting movie of a Data Services session at the last Microsoft PDC.

Abstract:
TL07 Developing Applications Using Data Services
Presenter: Mike Flasko (Also see his blog).

In the near future, applications will be developed using a combination of custom application code and online building block services, including data-centric services. In this session we discuss advancements in the Microsoft development platform and online service interfaces to enable seamless interaction with data services both on-premises (e.g., ADO.NET Data Services Framework over on-premises SQL Server) and in the cloud (e.g., SQL Server Data Services). Learn how you can leverage existing know-how related to LINQ (Language Integrated Query), data access APIs, data-binding, and more when building applications using online data.

Developing applications with Data Services

SOA Approach to integration

I recently read this post on TSS, where someone claims REST is object-oriented while SOAP would be process-oriented. That's a funny way to compare these approaches, it is not false but SOAP can also be object-oriented if you want, and Data Services are all about that, the difference is that you can manage the level of granularity in data integration, you are not limited to encapsulate any "atomic resources" (whatever it means) with CRUD APIs.

SOA Approach to integration

Data Services at Microsoft TechEd EMEA 2008

DataDirect will be exhibiting at Microsoft Tech·Ed EMEA 2008, taking place at Barcelona's Centre Convencions Internaticional, 10 – 14 November 2008.

In addition, Solutions Architect John de Longa will present “Frontiers in Data Access” on Tuesday, 11 November 2008 from 14:50 to 15:10 in theatre two followed by a second presentation Wednesday, 12 November 2005 at 15:20.

In his presentation, “Frontiers in Data Access” John de Longa will offer technical insight and valuable advice for enterprise, system and data architects as well as application developers and managers. He will discuss how to improve the scalability and flexibility of data access strategies.

“As more organisations implement service-oriented architectures they find themselves with a multitude of business services that need to access enterprise data – too often data access issues are overlooked until they become a problem,” explains John de Longa. “I’ll be exploring the concept of data services as an emerging approach for addressing data challenges in SOA.”
Data services enhance flexibility and simplify application development by providing a consistent mechanism for accessing, integrating and updating enterprise data, regardless of where it is stored.

Data Services at Microsoft TechEd EMEA 2008

Data Services World 2008 in San Jose

The second issue of Data Services World will be held in San Jose on November 20th 2008. Once again, DataDirect will be the main sponsor of this event.
Rob Steward, our VP engineering will present the New Frontier for Data Services and I will participate to the power panel.
This is a great event for our technologies and I will be happy to meet with you over there and discuss the trends in Data Services.

Data Services World 2008 in San Jose

LINQ to Entities and LINQ to SQL

Eventually Microsoft decided to focus on LINQ to Entities for the next .NET 4.0.
See the ADO.NET blog and this entry.

This is a great decision, because their Data Access offer was much more than confusing to users, with too many ways to access the same data and too much overlaps between their different technologies.

The good news is that Microsoft recognizes the importance of having an intermediate Business Model to integrate data. This is a great milestone for the whole software industry!

LINQ to Entities and LINQ to SQL

Friday, October 10, 2008

New version of Granite Data Services

http://www.infoq.com/news/2008/10/GDS-110-release.
See also: http://www.graniteds.org and Documentation.

New version of Granite Data Services

http://www.infoq.com/news/2008/10/GDS-110-release.
See also: http://www.graniteds.org and Documentation.

WSO2 Data Services

WSO2 recently introduce its new version of Data Services. The big new improvment is the ability to aggregate data from multiple databases.
See http://www.ebizq.net/news/10368.html and http://blogs.zdnet.com/Gardner/?p=2741.
My personal opinion is that this tool is just useful to publish few stored procedures or queries over SOAP / REST. The limitations are important:
Only supports relational DBMS.
Integration model is quite limited (see their examples).
No intermediate, neutral business model in the middle.
No optimizations possible.
Administrator have to manually create the query, then deploy the data services, this simply does not scale.
There is no API at the client-side to smartly manage the results of the data services calls.
Curiously, they don't say anything about updates and transactions.
See discussion here: http://www.theserverside.com/news/thread.tss?thread_id=51002.
Anyway, it is good to have an open source entry-level product in our Data Services market.

WSO2 Data Services

Wednesday, September 3, 2008

Is database-as-a-service a good idea?

Seen this article by Jean-Jacques Dubray on InfoQ, related to this other article by Arnon Rotem who explains why Database-as-a-Service (DBaaS) is a bad idea:

It circumvents the whole idea about "Services" - there's no business logic

It makes for CRUD resources/services

It is exposing internal database structure or data rather than a thought-out contract

It encourages bypassing real services and going straight to their data

It creates a blob service (the data source)

It encourages minuscule half-services (the multiple "interfaces" of said blob) that disregard few of the fallacies of distributed computing

It is just client-server in sheep's clothing

To me, DBaaS is just a new way to access a database and it is far from Data Services (which are all about data integration, mapping, persistence & SOA). The most interesting benefits of a DBaaS are to offer a DB that you won't administrate. Conversely, it also raises some questions in terms of scalability and confidentiality.

Is database-as-a-service a good idea?

Seen this article by Jean-Jacques Dubray on InfoQ, related to this other article by Arnon Rotem who explains why Database-as-a-Service (DBaaS) is a bad idea:

It circumvents the whole idea about "Services" - there's no business logic

It makes for CRUD resources/services

It is exposing internal database structure or data rather than a thought-out contract

It encourages bypassing real services and going straight to their data

It creates a blob service (the data source)

It encourages minuscule half-services (the multiple "interfaces" of said blob) that disregard few of the fallacies of distributed computing

It is just client-server in sheep's clothing

The coming wave in Data Services

A good introduction to Data Services by John Goodson, VP and GM of DataDirect, as part of the last Data Services World event, last June in NY.

The coming wave in Data Services

A good introduction to Data Services by John Goodson, VP and GM of DataDirect, as part of the last Data Services World event, last June in NY.

YAODBMS: NeoDatis

Yet another ODBMS: NeoDatis.

I have no time to review its features, but it is so funny to see all these new ODBMS.

YAODBMS: NeoDatis

Yet another ODBMS: NeoDatis.

I have no time to review its features, but it is so funny to see all these new ODBMS.

Alternative to the Entity Framework?

NHibernate 2.0 arrived. It seems there is no support for LINQ yet, but it is on the roadmap a next major release (2.1).

It seems this blog frequently covers the differences between NHibernate and the Entity Framework:

http://blog.domaindotnet.com/2008/08/24/nhibernate-20-gold-released-must-wait-for-linq-to-nhibernate/

http://blog.domaindotnet.com/2008/06/29/nhibernate-20-goes-beta-1-while-microsoft-linq-to-entities-receives-vote-of-no-confidence-from-many-leading-net-experts-as-unacceptable/

http://blog.domaindotnet.com/2006/09/09/objectrelational-tools-nhibernate-and-microsoft-adonet-entity-framework/

Alternative to the Entity Framework?

http://blog.domaindotnet.com/2008/08/24/nhibernate-20-gold-released-must-wait-for-linq-to-nhibernate/

http://blog.domaindotnet.com/2008/06/29/nhibernate-20-goes-beta-1-while-microsoft-linq-to-entities-receives-vote-of-no-confidence-from-many-leading-net-experts-as-unacceptable/

http://blog.domaindotnet.com/2006/09/09/objectrelational-tools-nhibernate-and-microsoft-adonet-entity-framework/

The need for fetch plans

As described in this article there is a strong need for fetch plans is JPA. Some people, even within the JPA expert group, seem to think there is no need for a specific API for fetch plans, they could be covered by criteria APIs. I personnaly think that a fetch plan is not a criteria or a filter, it is something related but different. Regarding this feature, data access technologies coming from the JDO world have some advantages, as this feature has been discussed with the JDO expert group since a long time.

Obviously, fetch plans are even more important when dealing with a disconnected data access model, like in Data Services. Some partial reconnection could be allowed when relationships are unknown during navigation, but current network technologies certainly cannot support full lazy loading over the Internet.

The need for fetch plans

Tuesday, September 2, 2008

Versioning of objects

Versioning of persistent objects is a complex probem. People, mostly in the finance industry, are looking for solutions to this since years (see this post for instance). It seems there are now some solutions emerging, like Envers from RedHat/JBoss.

It is not clear from their web site if Envers supports JPA or only Hibernate, it seems to me it is limited to Hibernate.

LiDO, the old mapping technology of Xcalia, used to support this powerful feature in its version 2 (2003). This is something we have deprecated when we decided to open our data access engine, in order to support any kind of data sources, not only RDBMS.

Now that ORM is well established, and the basci problem is almost solved, it is the time to add features with business added-value like versioning.

Versioning of objects

JDO Instruments v3

New version is released. This is an open source ODBMS, compliant with JDO.

http://www.theserverside.com/news/thread.tss?thread_id=50283.

http://www.jdoinstruments.org/

JDO Instruments v3

New version is released. This is an open source ODBMS, compliant with JDO.

http://www.theserverside.com/news/thread.tss?thread_id=50283.

http://www.jdoinstruments.org/

Saturday, August 23, 2008

Data Services blog is back

You didn't see any activity on this blog since last June. I first get married beginning of July, then was on vacation in the South of France and then recently moved to the US in the Washington DC area.

I now have Internet at home since yesterday and I'm ready to go on with all these exciting news and trends about Data Services and Data Access in general.

Stay tuned! Eric Samson.

Data Services blog is back

Wednesday, June 25, 2008

SOA & data management: Understanding the data service layer

From Steve Karlovitz:

SOAs and data management: Understanding the data service layer

SOA & data management: Understanding the data service layer

From Steve Karlovitz:

SOAs and data management: Understanding the data service layer

The Case for Enterprise Data Services in SOA

Seen on eBizq, this fundamental article from Oracle's Jeff Pollock: The Case for Enterprise Data Services in SOA.

The Case for Enterprise Data Services in SOA

Seen on eBizq, this fundamental article from Oracle's Jeff Pollock: The Case for Enterprise Data Services in SOA.

Data Management for SOA

Seen on EDS blogs: Data Management for SOA.

They wrote somewhere: '... Jill Dyche asserts that "SOA Starts with Data". She advocates creating data services-creating data hubs as services that manage and provide access to master data. Starting with data services has an appeal to IT organizations that feel the need to adopt SOA ...'

This sounds like music to my ears.

I also like the conclusion: "Data management for SOA should be approached as requiring an enterprise logical data model, mechanisms for federation and sharing of data among relatively autonomous service units, and a data management plan that defines responsibilities, flows, master data stores, latency of updates, synchronization strategies and accountability for data integrity and protection. This plan must align with the organizational responsibilities of service units and their data needs, and it must ultimately support an integrated representation of the state of the enterprise-history, current state and future plans."

See also Jean-Jacques Dubray's reaction on InfoQ: Enterprise Data Management, the 3rd face of the SOA/BPM coin?

Enterprise Data Management on Wikipedia.

The Enterprise Data Management Council Web site.

Data Management for SOA

Tuesday, June 24, 2008

Transactions on the Web

Interesting interview of Mark Little (JBoss).

Transactions on the Web

Interesting interview of Mark Little (JBoss).

Entity Framework v2 transparent design

They will tell you what they are thinking about and you can even give your feedback ==> http://blogs.msdn.com/efdesign/default.aspx

Some ideas for V2:

Persistence Ignorance : We are looking at ways to introduce a full POCO solution for state management and interaction with the ObjectContext.

N-Tier Support : Today we support Data Contract serialization of entities or exposing entities via Astoria, in V2 we would like to expand to a DataSet like experience where one can remote graphs and changes to the graphs across the wire using standard WCF services.

Code-First : We want to enable a convention based, code only experience with EF wherein one may start with classes and opt-in to database generation and deployment. We expect that we would provide attributes and external mapping capabilities for people who wanted something beyond the convention based mapping.

TDD Scenarios: With the introduction of POCO classes some of the TDD scenarios get a lot easier, and we are looking at incorporating some other asks to better fill out the scenario, such as making our ObjectQuery<T> and other generated members of our context and classes virtual.

FK's : Today we support bi-directional relationships, and we are looking at introducing model concepts to facilitate the definition of FK like experiences in the model or in one's POCO classes.

Lazy Loading: Today we support explicit lazy loading (calling .Load), and we are looking at various options around LoadOptions as well as outright implicit lazy loading.

Query Tree Re-Writing: This allows framework developers to contextualy, vertically and horizontally filter query results.

See also Danny Simmon's blog, ADO.Net blog and the advisory council.

Entity Framework v2 transparent design

They will tell you what they are thinking about and you can even give your feedback ==> http://blogs.msdn.com/efdesign/default.aspx

Some ideas for V2:

Persistence Ignorance : We are looking at ways to introduce a full POCO solution for state management and interaction with the ObjectContext.

N-Tier Support : Today we support Data Contract serialization of entities or exposing entities via Astoria, in V2 we would like to expand to a DataSet like experience where one can remote graphs and changes to the graphs across the wire using standard WCF services.

Code-First : We want to enable a convention based, code only experience with EF wherein one may start with classes and opt-in to database generation and deployment. We expect that we would provide attributes and external mapping capabilities for people who wanted something beyond the convention based mapping.

TDD Scenarios: With the introduction of POCO classes some of the TDD scenarios get a lot easier, and we are looking at incorporating some other asks to better fill out the scenario, such as making our ObjectQuery<T> and other generated members of our context and classes virtual.

FK's : Today we support bi-directional relationships, and we are looking at introducing model concepts to facilitate the definition of FK like experiences in the model or in one's POCO classes.

Lazy Loading: Today we support explicit lazy loading (calling .Load), and we are looking at various options around LoadOptions as well as outright implicit lazy loading.

Query Tree Re-Writing: This allows framework developers to contextualy, vertically and horizontally filter query results.

See also Danny Simmon's blog, ADO.Net blog and the advisory council.

RAM is the new disk!

Seen this interesting post seen on InfoQ, also relayed on Nati Shalom's blog (Gigaspaces).

This raises some comments:

No discussion there is a need for in-memory databases.
- RAM and network evolutions are changing the database space. And maybe the impact of network evolution is even more important than the RAM.
  - RAM disks exist for a long time in OS.
  - USB keys are a kind of disk on RAM.
  - The notion of keeping information alive once the box is stopped is important. Yes, at the end the disk could be mostly used for archiving rather than for storing.
- Most current database technologies are cluttered by disk access APIs, this also includes db4o, HSQLDB and the like.
  - That said, as opposed to what Nati said, some advanced database technologies (like Oracle and Versant for instance) are able to bypass OS stream-oriented disk APIs and can directly manage the disk space.
  - Having caches in database engines will not olve the problem, this remembers me the first white papers from TimesTen, 12 years ago.

New technologies won't replace existing ones, they complement them.
- 15 years ago, some were predicting the death of mainframes... they are still predominant.
- Disk technologies can still be improved, see Cameron Purdy's comment for instance.
- Disks have seen the most impressive progression among the various computers components since 20 years.

In-memory data grids (IMDG) won't eliminate the need for ORM (and Universal mapping, when extended to non-relational data stores and non-object consumers).
- They just put it in a different place, in an intermediate box.

This all leads us to the notion of a Data Services Platform. Which includes a cache, but is not limited to a cache. The Data Access Layer will become even more important than the database itself which will become the storage layer.

RAM is the new disk!

Seen this interesting post seen on InfoQ, also relayed on Nati Shalom's blog (Gigaspaces).

This raises some comments:

No discussion there is a need for in-memory databases.
- RAM and network evolutions are changing the database space. And maybe the impact of network evolution is even more important than the RAM.
  - RAM disks exist for a long time in OS.
  - USB keys are a kind of disk on RAM.
  - The notion of keeping information alive once the box is stopped is important. Yes, at the end the disk could be mostly used for archiving rather than for storing.
- Most current database technologies are cluttered by disk access APIs, this also includes db4o, HSQLDB and the like.
  - That said, as opposed to what Nati said, some advanced database technologies (like Oracle and Versant for instance) are able to bypass OS stream-oriented disk APIs and can directly manage the disk space.
  - Having caches in database engines will not olve the problem, this remembers me the first white papers from TimesTen, 12 years ago.

New technologies won't replace existing ones, they complement them.
- 15 years ago, some were predicting the death of mainframes... they are still predominant.
- Disk technologies can still be improved, see Cameron Purdy's comment for instance.
- Disks have seen the most impressive progression among the various computers components since 20 years.

In-memory data grids (IMDG) won't eliminate the need for ORM (and Universal mapping, when extended to non-relational data stores and non-object consumers).
- They just put it in a different place, in an intermediate box.

Monday, June 23, 2008

Why old DB optimizers cannot deal with the cloud

First article about query optimizers: http://www.databasecolumn.com/2008/06/designing-systems-for-the-grid.html. I hope the following articles will give us much more information! That optimization problem is really interesting. We know it is a NP-complete king of problem. And it is even more interesting when dealing with multiple data sources. That's one of the challenges of a modern Data Services Platform.

Wednesday, June 18, 2008

Perst 3.0

In the renewal of the ODBMS market it seems embedded applications is one of the best niches. McObject announces a Perst version 3.0, with support for both Java and .Net (with the LINQ query language).

http://www.infoq.com/news/2008/06/persist-v3

Monday, December 15, 2008

Monday, December 8, 2008

Friday, December 5, 2008

Tuesday, December 2, 2008

Wednesday, November 26, 2008

Tuesday, November 25, 2008

Monday, November 10, 2008

Saturday, November 8, 2008

Friday, November 7, 2008

Thursday, November 6, 2008

Wednesday, November 5, 2008

Friday, October 10, 2008

Wednesday, September 3, 2008

Tuesday, September 2, 2008

Saturday, August 23, 2008

Wednesday, June 25, 2008

Tuesday, June 24, 2008

Monday, June 23, 2008

Wednesday, June 18, 2008

Blog Archive

About Erix