Ditching two-phased commits

I’ve had a love-hate relationship with two-phased commits during my years with messaging. Even if MSDTC was free to set up, it doesn’t come free in terms of throughput. Most people run into 2PC in messaging because because queueing systems and databases are two different resources, and therefore don’t participate in the same transaction. Ideally, I’d have all three participants either succeed or fail together:

image

Since the queues in this picture are different resources than the database, I need to involve a third party, or transaction manager, to coordinate transactions between these three resources.

DTC, when it works, works really well. It’s much, much easier to not care about the consequences of a lack of coordination. In fact, I’d recommend not caring until you actually do care, because ditching two-phased commits does require work. Luckily for us, there are a ton of resources on how to do exactly that!

Most of the time, literature around avoiding 2PC is concerned about an entirely different situation, where I have two separate databases:

image

We’re doing messaging, which means that it’s typically the consumer of the message that does something against other data stores. So even though we’re avoiding communicating with two databases, it’s still two resources, and thus a need to coordinate!

But again, that coordination comes with a cost. A fairly large cost, in some recent testing we found that overall throughput dropped 80%, or to put it another way, ditching DTC saw a 5X increase in throughput. Five fold!

For some systems, that throughput doesn’t matter much, but for those that have a reasonably high volume of messages, or sensitive SLAs, it’s worth investigating alternative approaches.

General rules of thumb

Like most messaging approaches, the ways of avoiding coordination are right in front of our faces. In Gregor Hohpe’s excellent paper on Starbucks, he points out any real-world system that values throughput over absolute consistency avoids distributed transactions. The basic ideas are:

  • Idempotency is king. Get this and you’re halfway home
  • Strategies for dealing with downstream effects is a business decision

Idempotency is absolutely required, but it’s not that hard to apply. For some operations, we can rely on natural idempotency. If I’m asked to turn on the light, receiving the request twice means the same outcome – the light is on! For state machine-like systems, idempotency is a bit easier.

For operations that aren’t naturally idempotent (launch the nuclear missile), we’ll need to get a little creative. If we can identify some correlating information from a request (The president called at 9:15 to launch the missile) or just assign some correlation information (The president has issued request #132 to launch the missile), we can simply keep a journal on the receiving side. If it’s expensive to keep a journal around, we can recycle/trim our journals if they get too big.

Downstream effects become more interesting. If throughput is a high concern, we can rely on compensating actions (customer didn’t have enough money, cancel the order) or more journaling. Instead of sending a message immediately, shouting out messages to downstream systems, we can instead just write down in the same persistent store as our other data another journal for outgoing messages.

Once our local DB transaction is complete, it’s just a matter of sending the messages we’ve written down to send out down the line, and crossing them off our list of “sent” messages. And since downstream systems can deal with at-least-once messages through our idempotency guarantees.

How I learned to stop worrying and ditch 2PC

In some current systems, we’re deciding on a service-by-service basis whether or not we want to enlist or not enlist in distributed transactions. It’s still annoying to try and build a system-wide solution (though the event sourcing guys have this more or less in the bag), so until then, I can just use business decisions to guide me one way or the other.

But it is time to let go and stop worrying so much. Honestly, unless your services have downstream side effects, you can safely turn off DTC if your work is idempotent. If you have downstream side effects, there’s a number of paths to choose from. While I’m not saying goodbye forever (still the best solution if it were absolutely free to use), it is time to shop around.

Related Articles:

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Jimmy Bogard

I'm a technical architect with Headspring in Austin, TX. I focus on DDD, distributed systems, and any other acronym-centric design/architecture/methodology. I created AutoMapper and am a co-author of the ASP.NET MVC in Action books.
This entry was posted in Messaging, SOA. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1352

  • http://twitter.com/viktornordling Viktor Nordling

    I couldn’t make sense of this sentence: “And since downstream systems can deal with at-least-once messages through our idempotency guarantees.”

    Should it be, “And since downstream systems can deal with at-least-once messages through our idempotency guarantees, we can always retry the operation if there is a failure down the stream.”, or do I just need more coffee?

    • jbogard

      Nope, that’s right. Maybe I needed more coffee when I was writing this!

  • http://jonathanoliver.com Jonathan Oliver

    Welcome to the new world. Things are more wild out here in this untamed land, but you have the freedom to do anything you want since throwing off the shackles of tyranny.

  • Pingback: Ditching two-phased commits | Jimmy Bogard's Bl...

  • Dale Anderson

    A good article. I’ve always kinda known 2PC is not worth worrying about, but you gave me a bit of an “aha” moment, that idempotence is the key.

  • Pingback: Scott Banwart's Blog › Distributed Weekly 206

  • Mark Allen

    Great post. We leverage NServiceBus quite heavily, and while we bask in the convenience of 2PC exactly-once messaging, we anticipate that as we scale this is going to break down due to performance issues.

    So without DTC, how do you ensure that a message sent to a queue actually makes it? Is the Bus.Send basically an atomic operation, whereby if it succeeds then you are assured it has made it to the destination queue? I understand the notion of at-least-once messaging, but am nervous about the scenario where the sender thinks the message is delivered, but it actually never makes it to the intended target queue.

    • Mark Allen

      Ultimately, the question I’m trying to answer is without DTC, how are you assured that you are not losing messages, hence having to deal with none-or-at-least-once messaging rather than at-least-once messaging?

      • jbogard

        Most queueing technologies provide acknowledgements that a message has been accepted (handling is a separate deal). That’s why you have “at least once” supported by just about everyone.

  • http://www.yepi6.org/ yepi6

    Thanks, the article gave me much more useful information.

  • Pingback: (Un) Reliability in messaging: idempotency and de-duplication | Jimmy Bogard's Blog

  • Arturo Hernandez

    Suppose there is some kind of failure downstream, and it is not recoverable in a reasonable time. Even though the service got that at-least-once call now we have several systems that processed the transaction and one service that did not process it. I could see a system that processes transactions on it’s own until it recovers. even if it is a long time. Then the problem is incoming transactions that depend on the original.

  • http://au.linkedin.com/in/afifmohammed Afif Mohammed

    Great write up. I am wondering since the fluent configuration API has the option to configure the Bus to not use Distributed transactions, do you do (or recommend) that?

    • jbogard

      Yep, that’s what we do – although it is worth testing to see exactly what runtime behavior is. We found that when we turn off distributed transactions, and regular transactions that we lost retries.

  • Pingback: Taro beta | 水言木

  • Pingback: Good Reads 2013-10 | 水言木