NoSQL the Sequel, now with SQL - the revenge of DocumentDB
Sorry, but I can't help myself when a pun like this comes up. A couple of weeks ago Microsoft released previews of two new databases on Azure, called Azure Search and Azure DocumentDB. These two services make the storage story on Azure much more complete, for the needs of many a software developer. At least from my experience, they close a significant gap, where I have often gone "outside Azure", for something that would seem obvious for them to offer.
Providing feedback
I have mostly been looking at DocumentDB, because it is more general purpose and because it is completely new - where Azure Search is built on top of Elastic Search. Having played a bit with the preview I am pretty excited to see how DocumentDB will progress. Certainly more than Ayende, who was quick to jump on the bashwagon - but of course he would hardly be doing his work if he didn't. As I wrote on twitter, I am excited about the idea, although I also feel that it has quite a bit of way to go before I would use it for production software.
As it turnes out, that tweet was read by the team building DocumentDB, so shortly after I was contacted and have been writing to them about my two cents on the subject. Suffice to say, that I like their openness and most of the answers that I am getting about where they are going. I was especially happy to hear that the preview version is limited with reguard to request sizes, and that performance is a key area of focus, so they will be looking at the feedback that they get on this.
From this experience I encourage everyone to give the team feedback. It can become a very good database, but as with a lot of projects, they need feedback.
What is it with naming?
Some of the interesting and new ideas in DocumentDB, is that although it is a document database and in the NoSQL, it is build to be queried with SQL. So in other words, it speaks the worlds first NoSQL SQL dialect. In my mind this makes perfece sense, because the problem NoSQL tries to solve has never been with SQL the language - but with SQL, the relational database. So once again we have underlined how bad a term NoSQL is, and it can join the ranks of HTML5 and Single Page Applications, which are both mostly about rich Javascript applications, and seldom actually one page. Well, I digress, but it is funny how naming keeps being the hardest part of software development.
The very basics
With SQL as the base query language, LINQ is obviously also available for querying through the API. This is the case already, even though it is early days for the API, which quickly becomes obvious, since basic things like accessing a collection requires a bit of querying and filtering. En example of a basic Service prepared to work with DocumentDB would for now look something like this.
public class CarService
{
private readonly string _databaseName;
public CarService(string databaseName)
{
_databaseName = databaseName;
}
private static DocumentClient GetClient()
{
return new DocumentClient(new Uri("<EndPointUrl>"), "<AuthorizationKey>");
}
private async Task<DocumentCollection> GetCollection(DocumentClient client, string collectionName)
{
var databases = client.CreateDatabaseQuery().Where(db => db.Id == _databaseName).ToArray();
Database database = databases.Any()
? databases.First()
: await client.CreateDatabaseAsync(new Database { Id = _databaseName });
var collections = client.CreateDocumentCollectionQuery(database.SelfLink)
.Where(col => col.Id == collectionName).ToArray();
return collections.Any()
? collections.First()
: await client.CreateDocumentCollectionAsync(database.SelfLink, new DocumentCollection { Id = collectionName });
}
}
Not too bad, but something I expect to be hidden away in the API itself later on. The basic cases of querying and creating documents from there are pretty simple, and could be something as familiar as the following.
public async Task<IEnumerable<Car>> Find(string collectionName)
{
using (var client = GetClient())
{
var documentCollection = await GetCollection(client, collectionName);
return client.CreateDocumentQuery<Car>(documentCollection.SelfLink).ToArray();
}
}
public async Task<Car> Save(string collectionName, Car car)
{
using (var client = GetClient())
{
var documentCollection = await GetCollection(client, collectionName);
var result = await client.CreateDocumentAsync(documentCollection.SelfLink, car);
car.Id = result.Resource.Id;
return car;
}
}
As this shows, the C# API handles null Id's ok. An oddity is that the same thing is not the case for the Javascript API on the server. The Javascript API would be used for implementing stored procedures and triggers, so most likely you will end up using both API's for any real work. On the server you will however have to provide a unique id yourself. I was told that this will be improved and the API's will be aligned though.
At this point we are fine to implement more methods and use LINQ as we know and love it. The most regularly used parts work as they should, but there are limitations, so pretty much anything like joins etc. across collections is not possible. This is by design, to allow running on multiple nodes and for scalability reasons. That being said, it is one of the parts I hope will get better, even though it doesn't sound like it is on the roadmap.
The competition
In the end when competing with MongoDB and RavenDB, I think that flexibility in querying will be key. Getting ahead og MongoDB in this area should be fairly easy, as it is one of the more funky parts of MongoDB, as soon as you are doing anything complex. With RavenDB it is not quite as easy, since it also has a LINQ API - however it also has its limitations. So it will be interesting to see what they can do, and also what functionality will find its way to DocumentDB going forward.
When it comes to tooling, it sounds like they have a pretty good and comprehensive roadmap. This is another area where Microsoft will have an advantage, with them having the most direct access to the Visual Studio team. All in all I think there is pleanty to look forward to - but I still feel like there is work to be done before I will be going to production with a DocumentDB in the belly of my solution.