Efficient transactional processing

Ayende had a post on how to handle race conditions and unbounded result sets, describing a problem where you needed to perform transactional work against a set of entities. A bad solution would be:

var subscriptions = session.Query<Subscription>().Where(s=>s.NextCharge < DateTime.Now);
foreach(var sub in subscriptions)
   sub.ChargeAccount(); // 1 sec

We have a rule on our system. A query can return an entity if and only if it queries solely on that entity’s identifier. Only one entity may be returned from a query at a time.

If we need to display information just to search, but not act upon, we can use any number of projection options like AutoMapper, SQL views, SQL projections, LINQ projections, de-normalized view tables with CQRS and so on.

However, if we need to ACT upon that information, we ALWAYS project into some sort of message object. Something like:

var subscriptions = session.Query<Subscription>()
    .Where(s=>s.NextCharge < DateTime.Now)


That Project.To is just the autoprojection to a DTO done at the SQL level. The object returned is a message that only includes the identifier, to be sent off for processing by message handlers (in my case, on the NServiceBus bus using MSMQ messages to 1 to N concurrent handlers).

In each handler, I’ll load up the aggregate root entity by its identifier, using whatever specific fetching strategy I need for that ONE entity and that ONE operation. I can then scale processing as much as I need to, and decouple myself from the synchronous, serial processing the “foreach” loop saddles me to.

Our rule of “Never ForEach on an Entity” lets us scale processing appropriately, handle failure conditions (like, what happens if the query returns 50K items and the 25Kth one fails), and constrain transactions to very tight windows, processing only one entity-operation per transaction

In the above case, I never even look at a parallel ForEach loop, because of needing to worry about failures. The query identifies which entities need work done to them, and no more. Another win for message-based architectures!

About Jimmy Bogard

I'm a technical architect with Headspring in Austin, TX. I focus on DDD, distributed systems, and any other acronym-centric design/architecture/methodology. I created AutoMapper and am a co-author of the ASP.NET MVC in Action books.
This entry was posted in Architecture. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Thomas

    If “ChargeAccountMessage” can be considered as an atomic operation, you are right. If a bunch of “ChargeAccountMessage” must be seen as one atomic operation, the solution you proposed is not correct.

    • Anonymous

      In that case, I’d have a saga coordinating the processing of all of those child operations, with compensating actions in case of failures.

      Or, if this is really a set-based operation and not transactional processing, a simple UPDATE statement would work too.

      • Totally agreed on differentiation between clearly OLTP-scenarios and batch-processing of data.

  • I think Ayende’s point was that you need to act *as quickly as possible* before other people charge the user’s account. A message-based architecture is clearly superior to a ‘foreach’ but it might incur processing latency that you would like to avoid in situations such as these.

    • Anonymous

      So if there are 1M items to process, the last in line gets processed quite slow. Messaging decouples that for me.

  • Carlos Ribas

    Sometimes comparing the overhead of the messaging system vs. the work unit cost makes it worthwhile to do in-process concurrency through some flavor of multi-threading. Handling failure does become complex, but often failure is expected to be very rare and thus the performance trade-off against that added complexity becomes worthwhile if the goal is truly focused around performance.

    This example case from Ayende (with a whopping 1 second per unit cost) is well-suited for your proposed solution, but certainly folks should not get the impression that using a message/bus architecture is a one-size-fits-all solution to problems in this space.

    • Anonymous

      No, you’re right. The other valid option I see is a set-based approach, doing this at the SQL level. But looping through more than one aggregate root to do work – I don’t see that as a real valid option any more.

      We might still do this synchronously, through in-process handlers, but still have the process generate messages.

      I don’t even think that it’s a performance question – separating into separate handlers reduces complexity, since I’m only dealing with one operation at a time against one aggregate root, and not having to worry about a unit of work with multiple entities getting updated at the same time.

      • Carlos Ribas

        Well I was thinking along the lines of the original problem — it is calling out to some kind of credit card processing system. Problems beyond data manipulation (which, if done against a set, often can be done at the database, especially in a system where projecting from SQL to DTOs/bypassing the entity’s behavior for reads is deemed safe/acceptable) can’t be reduced to doing it at the SQL level.

        Definitely agree with the cleanliness of a unit-based approach. And doing that on top of a message architecture is typically a good idea because it gives you flexibility. But the original point was about performance, so I was just calling out that maybe an in-process threading approach (where each worker is still just doing one aggregate) would make more sense since with a high unit cost (1 second) the host machine would be doing a lot of waiting anyway…

        The part about multiple entities being updated at the same time is NH-related implementation detail, right?

        • Charlie

          We have implemented a solution like this that processes 25k card payments at the end of every month. We run a query against the database to get the 25k requests and push them into a queue. We then set the number of threads on our message handler to match the number of available channels on our 3rd party’s web service. We plan to introduce a distributor to load balance across two physical servers for fault-tolerance.

  • I actually dealt with a scenario like this recently. One transaction per aggregate is indeed the way we went, though it doesn’t need to be distributed. Our solution just involved a thread pool does the processing for each aggregate in turn, each with its own transaction.

    • I should mention, in our case the number of aggregates we needed to process was suitable for a single server. But triggering processing via a new published message is a good approach since it treats the processing as a new use case, with a new request, transaction, etc.

  • Have you considered using sagas and timeouts for each of the subscriptions? Then, there’d be no need of the mentioned query.