How we got rid of the database–part 3

In my last two posts (part 1 and part 2) I described how a command that is sent from the client is handled by an aggregate of the domain model. I also discussed how the aggregate, when executing the command, raises an event which is then stored in the event store and furthermore published asynchronously by the infrastructure.

In this post I want to show how these published events can be used by observers (or projection generators) to create the read model.

Querying

When the user navigates to a screen the client sends a query to the read model. The query handler in the read model collects the requested data from the projections that make up the read model and returns this data to the client. The client never sends queries to the domain. The domain model is not designed to accept queries and return data. The domain model is optimized for write operations exclusively.

Assuming the user is navigating to the screen where existing tasks can be edited the query that the client triggers could be

image

and the data returned by the query handler would look like this

image

where

image

contains the full name and id of the candidates that are assigned to this task and

image

contains all the details of an animal that is target of the task.

Where does this data come from I hear you ask… Well, that’s the topic I’ll discuss next.

Generating the read model

Whenever we design a new screen we need data to display on this screen. Thus we define projection(s) that are tailored in a way that best suits our needs. Ideally we want to create a projection that allows us to get all the data we want with one single read operation to the data store. That is the ideal, but in reality that is not always possible and thus we just want to state this principle: design the projections in such a way that we can retrieve the data needed for a screen using the minimal amount of read operations, ideally one operation only.

This principle requires as a consequence that we store our data in a highly de-normalized way. We regard data duplication in the read model as a necessary consequence and do not try to avoid it. Storage space is extremely cheap nowadays.

Since we are not using an RDBMS to store our read model the projections do not have to be “flat”. We can projections that are made of object graphs.

A first approach would thus consequently be to define a projection that looks somewhat similar to the query response object, that we define above. Let’s do so

image

Please note that I use the typed Id (TaskId) introduced in part 1 in my view.

Now we need to define a class, that creates the task details projection for us. We want to make the implementation of this class as simple as possible. The class should be a POCO and only depend on a writer object

image

The writer that we inject into our projection generator class is responsible to physically write our views into the data store. In our case the data store will be the file system but it could be as well a table in an RDBMS or a document database or a Lucene index. From the perspective of this generator class it doesn’t really matter what type of data store it is.

The definition of the IAtomicWriter interface is very simple

image

For convenience we can then write some extension methods to this interface

image

Why not add these methods to the interface directly instead of writing extension methods you might ask. The reason is that we should always try to keep our interfaces as simple as possible such as that our code remains more composable and less coupled.

Having defined the interface and also added the above extension methods we can now continue to implement our projection generator class.

Each projection is created by events. All data that make up a projection are provided by events. In our sample the first event in the life cycle of a task is the NewTaskScheduled event that we defined in part 1. Let’s add code to handle this event in our projection generator

image

We use the same convention as we already introduced with the aggregate: we call all our methods When. Each of those methods has exactly one parameter, the event that it handles. This convention makes it easier for us to later on write some tools around our read model. I will discuss this in detail in a later post.

Note that we have used the Add (extension-) method of the writer since at the time when the NewTaskScheduled event happens the corresponding view does not yet exists.

Later on other events of interest might be published by the domain and our projection generator can listen to them. Let’s take as a sample the TaskPublished event. We add the following code to the projection generator

image

very simple, isn’t it?

The resulting file on the file system could look similar to this

image

(remember that we are using JSON serialization).

In the next post I’ll try to wire up all pieces that we have defined so far such as that we can run a little end-to-end demo. Stay tuned…

Related Articles:

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Gabriel Schenker

Gabriel N. Schenker started his career as a physicist. Following his passion and interest in stars and the universe he chose to write his Ph.D. thesis in astrophysics. Soon after this he dedicated all his time to his second passion, writing and architecting software. Gabriel has since been working for over 12 years as an independent consultant, trainer, and mentor mainly on the .NET platform. He is currently working as chief software architect in a mid-size US company based in Austin TX providing software and services to the pharmaceutical industry as well as to many well-known hospitals and universities throughout the US and in many other countries around the world. Gabriel is passionate about software development and tries to make the life of developers easier by providing guidelines and frameworks to reduce friction in the software development process. Gabriel is married and father of four children and during his spare time likes hiking in the mountains, cooking and reading.
This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • cbp

    I don’t yet understand by what mechanism the user is able to see updates to the read model as soon as they occur.

  • Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1127

  • https://me.yahoo.com/odedcoster#a91e9 Oded

    @cbp – Eventual Consistency. There is no guarantee for seeing _immediate_ results.

    • John Teague

      This really depends on implementation and what you mean by immediate feedback.  One  way to give immediate feedback even if the command and event process takes a while is to update the UI as if the command will succeed.  A core tenant to CQRS is to make commands as small as possible with high probability of success.  If the command passes validation, the chances of failure should be very small.  So you can update the user’s view with the result as if it did succeed and in the case of failure, take a “compensating action.”  Like alerting the user if they are still online or sending an email message if they are not.  

      A common example of this is credit card validation.  While the number is validated in real time, more verifications must take place and are done after the order is places, like credit limit checks.  If there is a problem you get an email letting you know your order was not placed and why.

  • Wayne M

    I’m just a little bit confused by this although it seems interesting.  How do you “sync” the data between the read model and the write model?  In a real-world scenario users are constantly updating the data (or customers are placing orders, whatever) and this usually has to be reflected fairly quickly in a back-end system with the “read model”.  If you’re using let’s say Sql Server for the write model and RavenDB for the read model, how is the data meant to be synced without impacting performance or making users wait a day to get the current day’s orders, etc.

    I guess the main confusion point for this whole CQRS type of architecture is that I don’t quite get how it applies to real scenarios; every job I’ve had data has needed to be updated almost in real-time; if Joe Customer places an order in our store, we need an agent to follow up with it relatively fast, if not immediately.   Is this type of architecture just not suited to the majority of transactional use cases or am I missing something crucial?

    • Anonymous

      That is a concern most newcomers have. I did too! But in reality it turns out not to be a problem; why? I’ll dedicate a upcoming post to that topic – eventual consistency. Stay tuned.

    • John Teague

      I hear this all the time too when I talk about CQRS.  There’s a couple of things I always talk about when I’m asked.  The first thing I say is that updates in real time are myth, at least it’s so difficult to really do it usually not worth it.  

      The example I give is a user is looking at an update screen, with the “current” data populated so they can change it.  The user then takes a 20 minute phone call and then comes back makes the change.  Did you check before making that update there were no changes while they were on the phone?  It’s not guaranteed to be  consistent.  We’ve sold to the users this myth that that they have consistent data when it’ really not,unless you go through great pains to ensure that’s the case.

      So assuming you aren’t guaranteeing perfect consistency, with an eventual consistency approach, you now must ask the business what level of consistency they need (explaining at 0ms consistency is not possible).  You’ll be surprised at the answers you get, 5 minuts, 20 minutes, 1 hour.  I’ve got applications that update data from different systems once a day, as requested from the business owners. It will be rare that they give an answer that is under 1 second.  If it is, then your job is describe the time and resources required to meet that SLA. 

      Once you know what your consistency contraints are, you can now give solutions to the problem based on an actual business requirement. 

      In your comment you said “almost real time”  is that real time from the users perspective or from the machine perpective?  Is 5 seconds “almost”  or is 5 milliseconds.  You can send a message to the other side of world in 5 seconds,  5 milliseconds is a different story.  The point is, until you ask you don’t know.

      • Anonymous

         Thank you John, you hit the nail! I wouldn’t have been able to explain it any better :)

      • Jimmy Bogard

        One thing to keep in mind is that the user typically doesn’t think of screens as reports. Putting a date on the screen doesn’t disabuse them of the desire to treat data as data, if that’s what they really want.

        Eventual consistency is a recent invention. Before computers, eventual consistency was across departmental boundaries, but not internally. In good task based UIs, users reason about tasks and think about success/fail in terms of requests, not reports.

    • Steve Friend

       ”every job I’ve had data has needed to be updated almost in real-time; if
      Joe Customer places an order in our store, we need an agent to follow
      up with it relatively fast, if not immediately. ”
      Really? How do you cope with peak hours, do you increase the number of agents you have so they can respond immediately?  Do your customers place order by ‘phone?

      I work for a large-ish online retailer (think ~$500m sales/year) and we don’t need to respond to orders immediately.

  • http://www.facebook.com/profile.php?id=690895110 Monsters X

    Why not use couchdb or something, same storage (JSON objects) but all sorts of easy fanciness (like clusters, atomic transactions, etc..)

  • darylmer

    Just want to put in a little note here about some of the questions that people have and mention that the Lokad CQRS sample project has been updated and is slightly different in a few areas.  As for the views, the way I usually describe the eventual consistency scenario is to think about those views almost like a report.  When you run a report is it only current at the instant it completes.  By the time you see it on screen or printed it should be considered old and that is why most people put the date and time a report was run on it.  People generally accept that a report is outdated basically the instant it completes. 

    The views are updated pretty darn quick.  The only scenario where they wouldn’t be is if there was a fairly significant number of commands that got dropped in just before your command.  I have been working on a project to convert data from an old system to a new system via issuing the old data as new commands and this is where things can get bogged down.  I am looking at finding a better way to deal with views used as indexes because an add to the index requires a full read/full write of the entire view as an atomic operation which bogs as there are more items added to the view.  I have thousands of commands issued in a a very short time(this is not a normal situation and is only for conversion) and you really see the impact when it has to open the index view update the data and then write it all back for each add/update.  This is an optimization situation where you have to get creative. Under normal circumstances this isn’t and won’t be an issue for me, however, if you plan on running under pretty heavy load you will have to find a better way to deal with this which may mean keeping a lightweight traditional database only for the indexed views.