Dec 03 2012

At the heart of every distributed system there is data distribution; writing small applications is fun and all, but it’s in the communication between these apps that lies most of the challenges with distributed architectures.

So here are three tricks that have been helping me get data distribution right:

Database as a facilitator

I’ve expressed many times before that databases are not APIs, and should not be used to share state between applications.

This doesn’t mean, however, that they can’t help facilitate the communication.

The pattern that I’ve been using and enjoying quite a lot lately is to use a database-based queue to run jobs that propagate data to other systems. This allows us to enjoy the benefits of ACID (pun intended) while still relying on well defined, versioned APIs to expose resources.

There are two important requirements for this to work, though:

  • Use the same database for business data and queue. Probably not the best idea for partitioned databases.
  • Share the connection between your queuing library and ORM, otherwise the ACID properties are lost. For Queue Classic you’ll want to look at this thread.

Enqueue ids, not data

Enqueue jobs to synchronize a particular record, not to send specific data.

More specifically, your job signature should look like notify_email_changed(user_id), not notify_email_changed(new_email). The latter fails to recognize the chaotic nature of a job queue: unless there’s a single process working on it, there’s just no guarantee around the order that jobs run - and things get worse when we consider failures.

Which brings us to the last trick:

Embrace idempotency

Between network failures, broken sockets and timeouts your background jobs are going to fail. To keep your distributed system consistent you’ll want to make sure that failed jobs are retried and, just as important - that the receiving end can be called again with the same data.

For this reason it’s crucial that all APIs involved in a distributed architecture are idempotent; support PUT and render 201s accordingly.

In conclusion

Assuming that these patterns work for you, they should not only allow you to get data distribution right, but also to avoid the complicated mechanisms that are usually applied to this kind of problem (distributed transactions comes to mind).

And if you’re into this kind of problems Heroku is hiring :}

blog comments powered by Disqus