Dev Diary – Scaling search for an Azure WebSite

azuresearchAdding real search capabilities to a custom web application is never easy. Search is a complex and deeply specialized area of development, and the tools available to us regular developers are monstrously complex.

With TicketDesk 2, I used the popular Lucene.net library to provide search. Ported from Apace Lucene (Java), this is the core technology that powers almost every popular search service, appliance, and search library on the market.

Once you’ve tackled the initial learning curve, Lucene.net isn’t all that difficult to leverage in a simple system like TicketDesk. It is freakishly fast, super flexible, and is a powerful search solution –not quite Google good, but close enough for most applications.

The problem with Lucene is that the design revolves around indexes stored on a traditional file system. There are 3rd party extensions that let you store the indexes in a database or in the cloud, but internally these all mimic the behaviors of a file system –that’s just how Lucene works.

You can have many components querying an index at the same time, but only one can write to an index at a time. Normally this single-writer limitation isn’t a huge problem. You code your application so it creates just one writer instance, then share it with any components that want to make an index update. As long as you keep things synchronous, it tends to work fine.

And here lies the problem. TicketDesk 2.5 and 3.0 are designed to run at scale, and will ship ready for deployment to the cloud as an Azure WebSite. In this scenario, there can be several instances of the application running at the same time, each needing to write to a single, shared Lucene index.

I spent a full week trying to find a way around the single-writer problem. WebSites in Azure shouldn’t write to their filesystems. Anything written locally is volatile, and vanishes whenever Azure automatically moves the site to a different host. So, I started with the AzureDirectory library for Lucene, which lets you store the search index in Azure blob storage. This works well, and gives Lucene a stable place to store shared indexes in the cloud.

The second problem was keeping multiple web site instances from writing to the index at the same time. Even though the index is in blob storage, Lucene still demands an exclusive write lock. Each websites can see when the index is locked by another writer, but there isn’t a way to know if the lock is legitimate, or an orphaned lock left behind when some other instance went down unexpectedly.

The only easy solution is to make sure there is a separate application to handle all index writes, and that there is only a single instance of that application running. You can scale the websites or other clients, just don’t scale the index writer application.

WebJobs were designed specifically for handling background on behalf of Azure WebSites, so I started there. Each website would queue index updates to an Azure Storage Queue, then the WebJob could come along and process the queue in the background. But WebJobs scale with the websites, so if you have multiple websites, you also have multiple webjobs. Hopefully, in the future MS will give us the ability to scale webjobs independent of the websites they service.

So the only remaining solution would be an old fashioned worker role. They scale independently –or in this case, can be instructed not to scale. This works well, but I just don’t like the solution. Effectively, the worker role ends up being a half-ass, custom search server. It costs a decent amount of money to run a separate worker role instance, plus it complicates the deployment and management of the entire application.

Failing to find a way to continue in Azure with custom Lucene indexes without a centralized search server, I figured I’d just design TicketDesk to take advantage of the existing Azure native solution –Azure Search Services. It is easy to code against (relatively), and there is a free tier that should be suitable for most smaller shops. For larger shops, the costs of a paid Azure Search tier is still reasonable when compared to the costs of a dedicated worker role.

So, out of the box, TicketDesk 2.5 will include at least two search providers; a Lucene provider for on-premise single instance setups, and native Azure Search for cloud deployments. I will eventually add an alternative for on-premise webfarms, and non-azure cloud VMs. In the meantime though, you could still scale in your own data-center by using Azure Search remotely, or stick with Lucene and manually disable the search writer on all but one instance of the site in the webfarm.

One additional note of interest: Azure Search is still in preview, and it doesn’t have an official client library for .Net yet. There are two 3rd party client libraries though; Reddog.Search and Azure Search Client Library. Both are free as NuGet pacakges, but only Reddog.Search has a public open source repository. Also, Reddog has a management portal you can run locally, or install as an Azure WebSite extension.

 

Leave a Reply

Your email address will not be published. Required fields are marked *