Implementing faceted search with MongoDB

by DotNetNerd 8. December 2011 08:29

After my last post on the faceted search I was asked to elaborate on how it was implemented with MongoDB. So that is just what I will do with this post – giving me the chance to comment a bit on the good and the bad experiences. Even though it was a good overall experience, there will always be things to wish for – the day I say otherwise is the day I should stop being a developer.

Preconditions

Like I wrote in the last post the implementation uses KnokoutJS to orchestrate updates to the search through  a MVVM based approach. On the server side we needed a service to provide the searching capabilities, and preferably provide data in a JSON format. Starting out the project already had an Ajax enabled WCF service, so we chose to build on that. Had it been a green field project I might have looked to WCF Web API or Nancy - but thats a topic for another day. Originally I was pointed towards using Lucene by a colleague, but I had previously looked at MongoDB, and the more we talked design, the more confident I felt it was a better way to go.

Designing documents

gamedesigndocumentsAn important part of working with a document database, is of course how to structure the documents. This is actually not quite as easy as it sounds, because most of us are used to thinking in terms of normalized relational models in a regular SQL database. So I chose a top down approach, starting with implementing the UI, so I would have the best possible idea of what I would need from my service, and if KnockoutJS would itself have any special requirements.

What became apparent was that instead of having one shape of document modelling products with fields as well as hierarchical information, splitting this data into different documents would make queries a lot simpler. Later on this proved to be a good decision, because the LINQ implementation in NoRM turned out to be very limited.

Querying for facets

This day in age everything seems to gravitate towards LINQ when it comes to queries, and it only felt natural to use the LINQ API that NoRM provides. With the documents in place we could get information about facets pretty easily.

public IEnumerable<CheckBoxItem> FindRegions(string pricegroup)
{
    using (var db = Mongo.Create(_connectionString))
    {
        return
            db.GetCollection<OutletSearchItem>()
                .AsQueryable()
                .Where(i => i.PriceGroup == pricegroup)
                .Select(i => i.Region)
                .ToList()
                .Distinct()
                .Select(r => new CheckBoxItem {Name = r, Text = r, Checked = false});
    }
}

As this sample shows, even basic stuff like using a distinct did not work as expected with NoRM. In this case we were ok with applying the distinct filtering on in memory objects, and with the amount of data that we have performance was still good.

Searching

At first I tried writing the search queries using LINQ, but it turned out not to be an option – the implementation of LINQ simply is not good enough. So I took a look at the other search API’s and found that using an Expando object to define searches looked promising. In this case expando object is a specific implementation for the API, and has nothing to do with .NET’s expando objects.

var expando = new Expando();
expando["CatalogLanguage"] = Q.Equals(query.CatalogLanguage);
if (query.ItemTypes.Length > 0) expando["ItemType"] = Q.In(query.ItemTypes);
if (query.Regions.Count > 0) expando["Region"] = Q.In(query.Regions.ToArray());
expando["Price"] = Q.GreaterOrEqual(query.PriceMin).And(Q.LessOrEqual(query.PriceMax));
if (query.ShowOnly.NoDefectOnItem) expando["NoDefectOnItem"] = query.ShowOnly.NoDefectOnItem;
if (query.ShowOnly.NoDefectOnPackaging) expando["NoDefectOnPackaging"] = query.ShowOnly.NoDefectOnPackaging;
if (query.News != "All") expando["Date"] = Q.GreaterThan(DateTime.Parse(query.News));

As you can see this API lends itself well to building queries based on the users selections for the different facets. The API is a little funky with the use of the Q static class, but still very readable and it performs very well.

Ordering

Lastly there was one more interesting aspect that needed to be handled, and that was ordering of the results, which is chosen by the user. This can be done by passing an anonymous object to the Find method, which ultimately executed the query.

if (query.Sorting.Value == "Saving") order = new { Saving = OrderBy.Descending };
else if (query.Sorting.Value == "Price") order = new { Price = OrderBy.Ascending };
else if (query.Sorting.Value == "ItemType") order = new { ItemType = OrderBy.Ascending };
else if (query.Sorting.Value == "Region") order = new { Region = OrderBy.Ascending };

var collection = db.GetCollection<OutletSearchItem>();
OutletSearchItem[] result = collection.Find(expando, order, itemsPerPage, itemsToSkip).ToArray();

Looking at this one can only wonder why the ordering cannot be defined using a string. – seems that the API developer could learn something from the KISS principle.

Looking back

As I wrote in the last post I am proud of the solution we came up with, because it performs really well and I think the user experience is great. MongoDB is a really interesting product, and it was a good fit for this solution. The downsides have been the implementation of LINQ and the scattered documentation, but in spite of that it did not take long to implement.

To be fair only Microsoft have so far managed to implement LINQ as well as one could reasonably hope for. At least that is my experience, and what I think is the flipside of the shiny coin that LINQ is. This could indicate that the real problem might be how difficult it is to implement LINQ. I will stay out of that discussion, but at the end of the day I think LINQ provides a lot of value, and that the important thing is to not try and use that hammer for every nail that you come across.

Who am I?

My name is Christian Holm Diget, and I work as an independent consultant, in Denmark, where I write code, give advice on architecture and help with training. On the side I get to do a bit of speaking and help with miscellaneous community events.

Some of my primary focus areas are code quality, programming languages and using new technologies to provide value.

Microsoft Certified Professional Developer

Microsoft Most Valuable Professional

Month List

bedava tv izle