Eventual consistency in REST APIs

Not picking on an API in particular, but…wait, yes I am. Octopus (an awesome product) has a proposed API on GitHub, and one of the things it describes is how to deal with the fact that the backend is built on top of Raven DB, where eventual consistency its default mode for index results.

In it, the proposal includes an “IsStale” flag on the collection, as well as on the query itself, so you could do something like:

GET /api/environments?nonstale=true

Or similar. This presents a rather weird choice to the end user of the API – consistency as a choice, not on mutating operations (PUT/POST/DELETE) but on idempotent GET operations. I presume that this will use the “WaitForNonStaleResults” behavior of RavenDB – but this isn’t really something I’d expect to directly expose to clients.

Without directly exposing our persistence mechanism to our clients (i.e., what if we switched to SQL Server? Redis as a write-through cache to MondoDB? and so on), we have a number of options of dealing with eventual consistency in our REST APIs.

Option 1: Do nothing

The easiest approach for our API is to simply not care. Non-stale results should only really appear when we’re dealing with queries and collections – if you’re dealing with stale resources from GET actions directly against a resource, you’ve really got a weird interaction model.

Looking at some other APIs, like Netflix or GitHub or Trello, you don’t really see any option for influencing the consistency choice on a read.

So we could just ignore it. This is what a lot of APIs do, accept the POST/PUT/DELETE, and make sure that my resource itself is affected correctly. If I do a DELETE /users/jbogard and then a GET /users/jbogard, I expect a 404. But if I query users or search them, then I might not expect those results to be reflected in that model. I can be explicit in this by having search be a completely separate set of resources/entry point (i.e. /search?entity=users&name=jbogard), so it’s completely explicit that search/query is different than interacting with resources.

This is similar to how a library with a card catalog might work. Do I synchronize the activity of checking out a book with checking the catalog? Or might someone have grabbed the book from when I checked the card catalog? Or do I have someone hold the card while checking out a book?

What I don’t do is allow a person looking for a book to either be disappointed OR have the choice of yelling out to everyone in the library, “DOES ANYONE HAVE THIS BOOK OR IS OTHERWISE WANTING TO CHECK THIS BOOK OUT”. I fix my interaction model and just be up front about it.

Option 2: Provide feedback on consistency results

Instead of just punting on whether or not the collection/query is stale, we might provide that as feedback. We could return dates of last update, things like “results as of mm/dd/yyyy”. We often do this on printable reports so that when people print a report (and it is now preeeeeetty much as stale as it gets), we can include a timestamp of when it was printed so that anyone looking at it is informed of how fresh the data is.

Just a tidbit of information to the end user to let them know if their results are up to date or not. If the client made a mutating operation, they can compare the date of this mutation with the date on the query results to make a simple decision to query again, or just move on as planned. Again, the power is in the client not to affect how we query, but what decisions they can make.

Option 3: Return a 202 ACCEPTED status

If we really want to model an asynchronous operation in our REST API, we can look to the HTTP status codes to communicate this explicitly to our clients. We do this in a couple of places where processing is too expensive in the context of a request, so we return a 202 to let clients know that “we got your request, but it’s not getting processed at this moment.” The description from the W3C reads:

The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place. There is no facility for re-sending a status code from an asynchronous operation such as this.

The 202 response is intentionally non-committal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch-oriented process that is only run once per day) without requiring that the user agent’s connection to the server persist until the process is completed. The entity returned with this response SHOULD include an indication of the request’s current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.

We can return a pointer to an indication of the current status, or some estimate of completion. But we’re being 100% up front that your request is processed asynchronously.

This is similar to any long-running transaction. You place an order at a fast-food restaurant, and are given a correlation identifier that represents your order. You can come back and ask the status of your order at any time. But the cashier certainly doesn’t cook our order while we’re standing in line!

Option 4: Write-through cache

If these operations are expected to succeed (or we verify that the transactional write has succeeded), we can do a simple trick that a lot of high-volume websites do. If we don’t have an AP-system like Dynamo (choosing availability and partitionability over consistency), we might choose consistency instead. To do so, we can cache index results, and write our updated value to that store along with our regular persistent store.

It’s not the most exciting of options, but if we know that writes are much less frequent than reads (and we’re not partitioning, i.e. we pick CA of CAP), then it’s not too far-fetched to write to both our cache and our document store.

Of course, if this is all to force us to bypass our consistency model of the database we’ve chosen, it’s a lot of work. But still, it’s an option nonetheless.

Option 5: Choose a database that matches our consistency needs

If we don’t like the consistency model that our database provides, or we feel like we want to allow clients to choose a consistency model, we might view this as a case where we’ve simply chosen the wrong model. If clients NEED consistency, why not pick something that gives that to them? Or don’t allow them to choose.

This would be like, on writes, allowing clients that interact with a database that uses a relational model to also indicate the isolation level on writes. That’s something that should really be encapsulated by the operation, and made with SLAs and contracts about the behavior.

Conclusion

We have a lot of options here, and all are valid in some scenarios. In each example, we’ve chosen a consistency model that matches our needs, rather than have a compromised consistency model exposed from our database of choice. Before Amazon put out their Dynamo paper, it’s not like us as users of the website knew that this storage system existed in the back end. It was encapsulated. The most we could do was “Ctrl+F5” to force a request back to their servers, but that still had no real guarantees.

Instead of letting our database leak its consistency model to our API, it’s better to choose a model that makes this consistency model explicit, or offer no guarantees at all. But as a rule, I as a consumer should not care that you’ve chosen Raven DB instead of SQL Server for your back end. It’s just none of my business.

Related Articles:

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Jimmy Bogard

I'm a technical architect with Headspring in Austin, TX. I focus on DDD, distributed systems, and any other acronym-centric design/architecture/methodology. I created AutoMapper and am a co-author of the ASP.NET MVC in Action books.
This entry was posted in REST. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://jonathanoliver.com Jonathan Oliver

    I like option 2+3 the most: 202 on writes and “last modified” on reads. The last-modified information could even be returned as a response header (please use ISO 8601 for dates) rather than cluttering up the response body.

  • Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1357

  • AquaBirdConsult

    I’m a long time RavenDB user, and with the latest versions (2.0) of RavenDB the indexing is so fast I doubt you would be able to query fast enough to hit a stale index (although it is possible). It is even harder to hit a stale index if you are doing basic map indexes. Map/Reduce indexes is where you might hit stale results. Another approach to keep your data from getting stale is to leverage RavenDB’s Ids heavily, since loading a document using its Id is not bound to indexing.

    User’s don’t realize wether you are serving them stale data or not, so I would probably ignore it in the case of Octopus deploy. I wonder what version he is using?

    My choices would be Option 1 or Option 5. But I really do love RavenDB.

  • Pingback: Scott Banwart's Blog › Distributed Weekly 207

  • http://twitter.com/djidja8 Srdjan

    nice article Jimmy, as always.

    I would add one more option though:

    6) use modern databases that are consistent
    For example:
    HyperDex (http://hyperdex.org/consistency/)
    NuoDb (http://www.nuodb.com/explore/sql-cloud-database-how-it-works/)

    nothing wrong with that. :)

    • jbogard

      Isn’t that the same as option 5?

      • srdjan

        sure, but I stopped reading on 4… 8|

  • http://www.hopy1.com/ hopy

    This is what I’ve been looking for. Thank you!

  • Pingback: Eventual consistency in REST APIs | Jimmy Bogar...

  • paulstovell

    Hi Jimmy, I didn’t see your post about this until just now – thanks for posting it!

    In Option 2 you mention providing feedback on the consistency of results, which we plan to do (“IsStale=true”).

    What do you think of this alternative?

    “The Cache-Control: no-cache HTTP/1.0 header field is also intended for use in requests made by the client. It is a means for the browser to tell the server and any intermediate caches that it wants a fresh version of the resource”

    What if we thought about Raven’s inconsistent results as a cache, and used this header to specify whether we should WaitForNonStaleResults?