A year in review with CQRS

For the past year my team has been building and maintaining and application using CQRS with an Event Store as our persistence model.  I started gathering requirements for this project last January and the team started development in February.  We deployed the first version in June.  The first version was the bare minimum the end users needed to stop using Excel for the database and manual process.  After the initial deployment we continued to make further enhancements to put in the “Oops we really need this feature” to the application and usability enhancements to make it even more functional.  We are still working on this application today.  The business made a radical change in process and now were changing the application to meet the new needs of the business.

Given our history with this application and the phases we’ve gone through (initial development, maintenance / enhancements, and complete change in process) I think it’s a good case study in the feasibility of using CQRS style applications.  We’ve made some good decisions and some, err.., not so good decisions that I learned quite a bit from.

A little about the Application

The application is the standard “enterprise workflow” application, where the end users work product goes through several stages and different people do different tasks in order to complete the work.  These type of State-based, processes fit well with the CQRS model, as different events occur on the items in question, which determine what’s required and what action needs to take place.  There were very few “CRUD” type screens, but there were a few.

My definition of CQRS

Separate models for reads and writes

I think my definition heavily influenced how we built this application.  Notice there’s nothing there about messaging, MSMQ, distributed architecture, Event Sourcing, etc…  We did use some of these technologies and architecture in this app, however they are not required for CQRS. It’s simply an acknowledgement that the write model looks and behaves differently from the read model.  This simple acknowledgment can have profound impact on the way build your application and your ability to solve problems for your end users (and if you’re not doing that, well…)

Not Everything has to fit into the CQRS Mold

I think one common mistake made while building applications with CQRS is forcing every single functionality through the CQRS square hole.  There were parts of our application that were very simple and did not require the complexity of CQRS.  That’s right, CQRS is complex. I said it.  Commands must be created, validated and executed.  Events must be fired and handled.  Usually the complexity is mitigated by the simplification of the domain models.  But if you’re domain models are already simple, then a simpler approach will yield better results.

Do Not Start With a Distributed Architecture

When most people see examples or presentations of CQRS, it usually includes some sort of distributed architecture, where the commands and messages are put on some sort of queue, like MSMQ, RabbitMQ, or whatever.  And there can be some great advantages when using these techniques, mostly scalability.  Using a distributed approach increases the complexity to a whole new level.  However it is simply not necessary, at least at first.

For our application, we created a very simple in-process bus.  Meaning, execution of the command and all events were executed on the same thread.  We used an IOC container to find all of the event handlers based upon the event class type. It is a web application, so the threads were handled by the web server.  For a typical event, we had 4-6 event handlers, updating 4-5 different database tables.

There were some real advantages to starting with an in-process model.  We were able to get faster feedback when errors came up, and it was easier for us to find and fix bugs. This enabled us to build the application faster and get it into the end users quickly.

Once we were in production, we started recording metrics on the how quickly requests took and where there were performance issues.  What we discovered was the only real performance issues were when the users would perform batch operations, the same operation on many records (set based operations are nearly impossible with this architecture, or at least I haven’t figured out how to do it).  So once we determined which functions were commonly done in batch form, we moved them out of process with NServiceBus.  I think that’s one of the areas where this approach shines.  Moving these out of process, when necessary, is really simple.

Move work out of process only when necessary

Think Creatively about your Domain and View Models

One of the mistakes we made was the design of our domain models.  From a purely DDD / aggregate root / ubiquitous language, there was really only one AR.  But having just one AR means it’s REALLY big.  In retrospective, we should have been more creative with how we define the AR into something different.  Honestly, while this application has come complicated logic, I don’t think that we needed to solve it with a Domain Driven approach.  However, the library we used (per client request) required us to do so.  I don’t have a solution for this yet, when I do I’ll let you know (hint, I think the Actor/Model patter Chris Patterson has been talking about is a better fit).

Event Sourcing … Meh

Event Sourcing has been probably the most misunderstood and miscommunicated implementation strategy surrounding CQRS.  Like other pieces of the architecture, it’s an implementation choice, not a requirement for building this type of application.  Honestly, I was never sold on the concept before this project.  On this project I don’t think we got any of the benefits touted from using it.  To be clear, I think one of the problems is our implementation of the Event Sourcing persistence.  We are currently storing a binary copy of the event.  Which means that event lives in perpetuity and we can’t change it nor get rid of it. If it was serialized with something like Protocol Buffers I think we would have more flexibility, but I haven’t tested that yet so I can’t say for certain.

The biggest feature about using ES is the ability to replay the events and rebuild the state of your view models.  This sounds really cool, you get the ability to create new views in your application (like new reports)  and be fully avaialble with all data as if they were there from the beginning.  The only problem is that we’ve only done that once in the entire lifetime of the application.

But at the same time, using the event sourcing approach hasn’t really cost us anything either.  In the end, for this application, I would have been happy with just a Key-Value store of data or a document database for aggregate persistence.

Overall, I’m pleased with the application.  Like I said there was a big change in business process and we’ve been able to make those changes very fast in my opinion.  This is mainly due to the extreme decoupling you get from the way views are represented to how a domain object is changed or created.  I will likely continue to create applications of this type using this approach.  With a few changes underneath of course:)

Related Articles:

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://craiggwilson.myopenid.com/ Craig Wilson

    2 things I question:  

    1) You mention that because you are storing the events as binary, you can’t change them.  I believe that is the point.  If you need something different, you’d have a compensating event to correct it.

    2) You mention that you have only needed to rebuild the views once.  That is once more than you would have been able to do any other way.

    I don’t think event sourcing is the end-all, be-all, but it can make a lot of things simpler.

    • John Teague

      1) It’s not that I want to change the values in an event, it’s that I would like to add or remove fields from the event. We’ve had several stories where we added fields they wanted to store which required a brand new event. It would have been nice to simply add the fields to the event.

      2) Yes, but if I did not have the replay ability we would have functioned just fine without it.

      I’m sure there are domains out there where you replay frequently. This just isn’t one of them. So we did not benefit from that ability.

    • John Teague

      1) It’s not that I want to change the values in an event, it’s that I would like to add or remove fields from the event. We’ve had several stories where we added fields they wanted to store which required a brand new event. It would have been nice to simply add the fields to the event.

      2) Yes, but if I did not have the replay ability we would have functioned just fine without it.

      I’m sure there are domains out there where you replay frequently. This just isn’t one of them. So we did not benefit from that ability.

  • http://twitter.com/dagda1 dagda1

    If you only have one aggregate root then it seems unlikely you need the complexity of CQRS.

    Even Udi Dahan admits that CQRS is overkill for the vast majority of applications.  I have similar thoughts about overkill with full blown DDD or taking DDD too far which I have seen for applications that do not warrant it.

    This seems something you see or read a lot of in .NET.

    • John Teague

      Like I said, this is something we did wrong. In retrospect I would either a) break up the single AR into multiple Aggregate Roots (which we will probably do) or b) find another way to use CQRS without DDD. There are several requirements that would have been very difficult to do with a traditional n-tiered architecture using the domain model for persistence and queries. The event driven nature made some things very easy to do (more on that later). The additional layer of complexity CQRS added (or at least the event driven architecture) more than paid itself off with a reduction in complexity with our domain models and view and reporting models.

    • John Teague

      Like I said, this is something we did wrong. In retrospect I would either a) break up the single AR into multiple Aggregate Roots (which we will probably do) or b) find another way to use CQRS without DDD. 

      There were several requirements that would have been very difficult to do with a traditional n-tiered architecture using the domain model for persistence and queries. The event driven nature made some things very easy to do (more on that later). The additional layer of complexity CQRS added (or at least the event driven architecture) more than paid itself off with a reduction in complexity with our domain models and view and reporting models.

      I agree with you about DDD.  I think it is being overused (or at least the terminology) where other techniques would suffice.

  • Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1037

  • http://jonathanoliver.com Jonathan Oliver

    Did you end up building your own event store? Or did you use a preexisting one?

    • John Teague

      My client has their own event store, so I had to use that :)

      I would have preferred to use, or at least experiment with, other Event Stores available (ahem..)  I think some of the issues I have are with the ES implementation.  Or at least the serialization part.  I could have extended it, but alas…

  • Shane Courtrille

    ES can also be a big helper when it comes to figuring out bugs.  We have it setup so we can pull from production into dev, run all events up to a specific command.  At that point we can start our debuggers up and execute the command we know failed and see why.  Not something you use everyday but when you need it.. you usually REALLY need it.

    • John Teague

      I really didn’t want this to be all about Event Sourcing, but oh well.
      Yeah, I know people who use that technique to find issues too. And if you’re fully distributed, I’m sure it’s really helpful. But, we’re not fully distributed ( and I don’t think any application should be) we can capture exceptions the old fashioned way: with elmah and other log files.
      Plus, after a year of production use, we have 8 million records in our event store. I don’t want to replay unless I have to (I’m sure you’re replaying from a snapshot, but anyway)

      So the tl;dr; answer is it’s yet another “feature” that we have not needed. But as you said I’m sure it can come in handy.

  • Pingback: When should you use CQRS or event sourcing? » I haz teh codez

  • Dave Hounslow

    Having used event sourced architecture for over two years, one of the really great benefits is when trying to debug. It’s fantastic to be able to take production snapshots and event journals, replay on a dev box and breakpoint with an identical state of your production service.