How we got rid of the database–part 4

This is the fourth episode of a series of posts about how we do CQRS and event sourcing. To my (positive) surprise the first three parts (part 1, part 2, part3) have caused quite some discussion amongst the readers. Since I constantly have discussions about the very same topics with my team members and external co-workers I decided to dedicate this fourth part to answer some of those topics.

Eventual consistency

Full consistency (also called immediate consistency) is overrated, period!

When I have some doubt about whether something “is right” or “can possibly work” then I try to find analogies in the real world. And it turns out that when looking for analogies in the business world where eventual consistency applies then we find those everywhere! Even in scenarios which are “business-critical” or “life-threatening”. But then if real world works in an eventual consistent way, why are we tying to achieve full consistency at all cost in our LOB applications?

Let me give some real world samples of eventual consistency:

  • you are ordering a book at an online book store. After you have checked out and your payment by credit card has been accepted the application tells you that your order has been placed and will be shipped asap. At the moment you receive the confirmation the online store has not yet reserved the book for you nor is it guaranteed that this very book is on stock and if not that it is still available from the publisher. It some times happens that the book is not on stock and thus the delivery is delayed. In this case the online shop will notify you by sending an email. If the book is not available any more then the shop might offer you alternatives and compensate you with a special bonus.
  • The unit price of some share traded at the stock exchange changes. This change in price is appearing on the stock tickers only with a certain (short?) delay; maybe a couple of seconds or even minutes.
  • You order a coffee at your local coffee shop. The cashier forwards your order to the barista immediately and then takes your credit card to check out. To check the credit card and approve the payment might take a while. In the mean time the barista has already started to brew your coffee. It turns out that your credit card is invalid and you have no alternative payment method. The coffee cannot be undone and as a “compensating action” the barista gets the order to throw the coffee away.
  • A terrible accident happens. You call 911 and explain that there is a person having life threatening injuries. The person answering the call promises that EMS and fire fighters will be there as quickly as they can. At the moment you end the call the fire fighters and EMS have not yet been sent on their way nor are any surgeons put on hold that will take care of the injured person. All this is organized asynchronously. When the injured person finally arrives at the hospital he gets treated as soon as a team  of surgeons/doctors is available. The patient might even have to wait for some time if no doctor is available.

All these samples are not made up by me but are very real and happen daily! Thus I can assure you that we can develop applications that are eventual consistent and at the same time happily fulfill all business requirements. We just need to address the problem a little bit different. A recommendation is to design your application in a task oriented way. Each task should be very focused and well defined. Avoid having “fluffy” or vague use cases.

After a successful transaction I need to refresh/reload the screen

With CQRS and ES this is not possible since the read model is only eventually consistent with the domain model. The delay can be very short (a couple of milliseconds) or quite long (seconds to minutes) depending on the type of transaction.

But this is not a problem! Why not?

We design our application in a way that commands sent to the domain model have a high probability to succeed. This can be achieved by e.g. pre-validating the commands prior to sending them to the corresponding aggregate of the domain model. Assuming now that the command succeeds we can update the screen to reflect the new state without reloading everything from the database/data source. We already have all the necessary data at hand, otherwise we would not have been able to create and send the command in the first place. As a sample: if we send the command “PublishTask” to the domain model and the domain model sends back an ACK then we can set the status of the corresponding task on the screen to “published” without querying the database.

But what about if it takes a long time (seconds to minutes) until the read model is in sync with the domain model after a transaction has been executed?

In this case it is often the case that the person that triggers the transaction is not the same person/user that needs to see the result of the transaction (other than it has been completed successfully). As a sample take again our online book store. The manager decides to reduce the unit price of all books in stock by 10% to better compete with the competitors. Now who is most interested to see those changes in price? It is not the manager since he knows what he has just done. He does not have to browse through the list of all books to have a confirmation of the change. But the clients of course benefit from this price reduction. But it doesn’t matter if the price change is only reflected in the online catalog minutes after the manager has executed the price reduction.

Why not use a database for the event store or the read model

Well, first of all, please read the title of my series of blog posts: “How we got rid of the database”. This series is not only about the theory behind CQRS and event sourcing but also about the concrete application of an architecture based on CQRS/ES as well as our goal to reduce the overall complexity of the system. If we can achieve our goals with simpler means why not do it. Why shouldn’t we strive for simplicity when solving problems and achieving solutions?

It turned out that we did not leverage nor need all the features an RDBMS a la Oracle or SQL Server could offer. It was as if I took a truck when a bicycle would be sufficient or search for a calculator to calculate 5 + 5, instead of using my brain.

What we need when doing CQRS is

  • in the domain model: an efficient way to access a stream of events identified by the unique ID or the aggregate instance which produced the events.
    Solution –> we serialize all events as they occur into an append only file whose full name is a codified version of the ID.
  • in the read model: an efficient way of accessing (denormalized) instances (=the current state) of view models by their respective unique ID in addition to (secondary) indices that are e.g. serialized hash tables (or dictionaries). I will talk more about indices in a future post.

All that and more can be easily achieved by avoiding any database (relational or document) and just using some simple code that can read and (atomically) write single files.

How can we enforce referential integrity?

We cannot, at least not at the data level!

Integrity of data is solely guaranteed by our application. It is the responsibility of the application to maintain integrity and consistency of the data.

One important aspect in this regard is that we eliminate the concept of “delete” from our vocabulary (I know, I know, there are always exceptions…). In the business world the concept of “delete” does not exists

  • we do not delete an employee but rather release him
  • we do not delete an order but rather cancel it
  • we do not delete a coffee in a coffee shop but rather discard it (the ingredients have already been used)
  • an accountant does not delete a wrong journal entry but rather compensates it with a journal correction

If ever we have to “delete” an item we do not physically delete it but mark it “as deleted”.

But if we never physically delete anything from our data store we do not run into problems like “orphaned children” or “missing references”.

As an added benefit we are better prepared for any audit, since no data is ever lost.

What about backup?

We only ever need to make sure that our event store is backed up. We really(!) don’t care about the read model since it can always be rebuilt from the event store in case of catastrophic failure. And if this is not sufficient, we can always create the read model redundant; we just send the events (which are used to build the projections) to at least two different servers.

The event store on the other hand is just a folder full of files and can be backed up by any decent backup program out in the market. No special and expensive software needed!

About Gabriel Schenker

Gabriel N. Schenker started his career as a physicist. Following his passion and interest in stars and the universe he chose to write his Ph.D. thesis in astrophysics. Soon after this he dedicated all his time to his second passion, writing and architecting software. Gabriel has since been working for over 25 years as a consultant, software architect, trainer, and mentor mainly on the .NET platform. He is currently working as senior software architect at Alien Vault in Austin, Texas. Gabriel is passionate about software development and tries to make the life of developers easier by providing guidelines and frameworks to reduce friction in the software development process. Gabriel is married and father of four children and during his spare time likes hiking in the mountains, cooking and reading.
This entry was posted in CQRS, Event sourcing, no-database. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Chris Martin

    I can echo everything you’ve said in this post. 

    I’ts refreshing to see this model of application being used by more people. We’ve built our application on top of Lokad.Cqrs going on two years now. It’s very stable and with some creativity, very powerful.

  • Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1129()

  • Wayne M

    So basically you do small enough commands that you can assume it succeeds and act as though it does, but maybe send a notice later to the customer (let’s go with an e-com example as that’s what I’m familiar with) if there was a problem?  Let’s use a real-world example.  My company lets people book freight shipments online and choose what carrier they want to pick it up, and pay for it all online – the quote then goes to our backoffice system where an agent follows up with the carrier to make sure the shipment is handled and keeps in contact with the customer.  The idea here would be to handle each of those things (e.g. getting the quote, paying for the quote, etc.) as events, without worrying if the customer’s credit card couldn’t be charged (bad example though as we kind of have to wait for that) or if the carrier they chose is available?  So if they choose ABC Freight and later on something happens whereby ABC Freight can’t  service the request (this part is handled in the backoffice system), the agent would initiate that event and either that or more likely the agent would notify the customer to let them know?

    So is the idea to have one read model for the different applications, or would the frontend have a different read model than the backoffice system?  I guess that part comes down to specifics; it might be okay to replicate data or it might be okay to use the same depending on the actual data needed?Also, I’d like to know how you deal with synchronizing the data.  It seems the overall idea here is to only use a true database for the commands, and use a file or object DB or something that isn’t an RDBMS for the read-model, but that data has to be synced at constant intervals so the read model data is only a little bit stale.  I suppose that would be a good use case for SSIS or to a lesser extent some kind of batch script that runs to take data from the RDMBS and update the read model?

    • Anonymous

       Normally you do not have different read models that need to be synchronized. Each read model, or may I rather say each projection or view model is constantly/continuously updated by the events that are (asynchronously) published by the domain (aggregates). There is no (and never should be) inter-dependency between different read models/projections.
      And yes, in general the read model is only a tiny bit stale (in the range milliseconds to seconds).
      Validating the credit card usually is be part of the (synchronous) transaction but this is not a requirement as long as you foresee compensating actions in case of failure…

  • mybytrague

    I agree that the events shouldn’t be stored in a relational database like Oracle, if they are big. Seen from the database’s perspective, it doesn’t like LOBS much. It prefers fields, that’s what it’s memory policy is optimised for. Relational databases are great with updates – which the events shouldn’t have. Also, RDBMSes aren’t great for appends. And not for big volumes. They are great again for very flexible queries – which is why I’d say (and others said before) that putting a state (as opposed to the events leading to this state) into a RDBMS is a good idea.