Wither the Repository

Looking at the different Repository pattern implementations, one thing really surprised me – how far off these implementations are from the original Fowler definition of the Repository.  Instead, we see a transformation to the examples in the Evans description of Repository.  Fowler’s definition for a Repository is:

Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.

But in our original Repository pattern implementations, we see far more than Querying, but persistence as well (Create, Update and Delete).  This is consistent with some of the descriptions in Evans book, where the collection-like interface also allows for modifications – adding and removing – thus completing the lifecycle of a domain object.

But is this Repository pattern needed?  What exactly is the Repository buying us?  In normal usage, I see the Repository pattern doing two main tasks:

  • Encapsulate the data mapping layer
  • Encapsulating difficult queries (in the Named Query Method approach)

In some systems, including the one I’m working on right now, each entity corresponds 1 to 1 with a repository implementation – whether or not the repository implementation is doing anything.  Looking back at the Fowler PoEAA book, there is another interesting pattern at play – the Unit of Work pattern.  The Unit of Work:

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

A common Unit of Work implementation wraps the underlying data access layer, and provides means of creating transactional boundaries:

public interface IUnitOfWork : IDisposable
    void Initialize();
    void Commit();
    void Rollback();

public interface INHibernateUnitOfWork : IUnitOfWork
    ISession CurrentSession { get; }

The interesting piece here is the ISession interface from NHibernate.  It puts several key patterns together – Unit of Work, Identity Map, as well as the Data Mapper pattern.  Now, I’ve never been on a project that has wanted or needed to swap out ORM solutions (provided they chose an active, mature solution), so what is this encapsulation of the Repository buying us exactly?

A bigger issue comes into play when trying to implement Command-Query Separation throughout your application – the Repository pattern just does not support it well enough.  Eventually, you’ll likely be left with a class that is Repository in name only, but not the original intent.  Let’s look at some of the problems of Repository to get an idea how CQS doesn’t gel well.

Loose aggregate boundaries

The problems Aggregate Roots solve is one of spaghetti-coded associations between entities, where I could never figure out the right place to draw the boundary.  I originally assumed that a save operation meant that you would save a single aggregate root, and cascading would take care of the rest.  Unfortunately, that simplicity and rigidity tends to fall flat in larger models.  In an aggregate root, child entities only have identity within the boundaries of the root, and no outside identity.

In the web world, that would mean that anything with an Edit link is inherently an aggregate root, as it needs its own global identity to save properly.  We toyed around with local identity, and always passing around the parent object to work on the child.  But all that extra work didn’t buy us anything.  Local identity only bought us anything when you edited an entire entity on one page, including children.

Cascading poses other issues.  What we found is that cascading depends not on the entity I’m working with, but the specific operation/command I’m trying to perform.  Instead, we’ve seen aggregate boundaries depend completely on the command we perform.  Sometimes the root is object A, sometimes it’s B.  Instead of worrying too much about aggregate boundaries, we greatly reduced the relative boundaries of roots, until it was a use-case per use-case basis that determined where we drew our boundaries.

Because roots can hold references to other roots, the concept of a boundary and local identity eventually restricted itself to basically screens where you saved the parent and children in one operation.  Again, it was the C in CQS that drove boundaries.

Thus it started to become confusing how we should handle cascades and save logic.  Since boundaries varied screen-to-screen, it became obvious that any custom save logic in a Repository could be handled perfectly well in the ORM layer, with cascades.  Eventually, our model became extremely connected, with appropriate cascading and modification operations available only when specifically needed.

First-class queries

On virtually every screen our application, we show the exact same entity.  That’s around 150 screens or so, where what you see begins with one Person, and filters down from there.  This made things quite interesting from the query optimization standpoint.  We had two choices – go through the root Person entity to all of its children, in one slice.  Or, go from the children, and get back to the parent.  But sometimes we show bits and pieces of various slices of our Person object, things that did not fit well with just one Repository call.  We enabled lazy loading – but only because we wanted a well-connected object model, with the knowledge that we would optimize the access later.

When we actually got to the optimizations, we could either create a bunch of GetByIdForAbcScreen on our PersonRepository, or, encapsulate all of the fetching in a single Query object.  Guess which route we went…

With first-class query objects in our system, the need for a true PersonRepository becomes much less.  It doesn’t do any custom queries, those are now in distinct query objects, designed for each situation.

Wither the Repository?

No, not yet.  Complexity in a system is never uniform, and a single solution rarely fits every circumstance.  The tough part is figuring where to put custom persistence or fetching logic.  New query method on a Repository?  Override the Save method?  Is that custom save logic on every save for this entity, or just for certain commands?  Complexity resembles more the picture of cosmic background radiation from the Big Bang, almost random in nature, but a predictable and expected variation.  Most spots are in the middle of the complexity scale, some spots are so easy that the architecture chosen may seem too much, but other spots that bend our architecture to its limits.  The Repository is one of those spots.

Personally, I see a few trends in Repositories:

  • Trend towards a Generic Method Repository, where there is no custom logic for CRUD
  • Queries externalized and encapsulated in Query objects when needed, otherwise exposed through LINQ
  • Custom persistence logic externalized and encapsulated in Commands, that utilize the one Repository implementation as needed
  • Repository becomes relegated, in NHibernate terms, as the ISessionFacade

I’m not fully ready to chuck our custom repositories, as I’m still not quite fully cognizant of what this change might mean.  But the parallel inheritance hierarchies are naive and a little difficult to work with, and don’t really reflect the reality we’re trying to model in our controller level of our application – modeling Queries and Commands as first-class citizens.

About Jimmy Bogard

I'm a technical architect with Headspring in Austin, TX. I focus on DDD, distributed systems, and any other acronym-centric design/architecture/methodology. I created AutoMapper and am a co-author of the ASP.NET MVC in Action books.
This entry was posted in DomainDrivenDesign. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Eyston

    I had an app where we had a ‘package’ which was 1) a barcode transaction saved to SQL Server, 2) a purchase order in ERP system on IBM iSeries, and 3) pdf certifications on a file server. I don’t know if this is right or bad design, but it just made so much sense to PackageRepository.GetPackage(barcode) that would hook all of these different things up and return a single entity (the code made so much more sense). I think Evan briefly talks about this with regard to ‘should repositories be factories’, but that is how I was using it for sure.

    One thing I’m struggling with in CQS is that if you completely ignore UI and only focus on commands your domain models really should have no properties (getters/setters) and only expose behavior. So your domain model becomes an interface (either directly, or just as a side effect), which is kind of cool. But then you have to test that the behavior works, and I’m not sure the best way to do that because previously I’d use those getters/setters. I’m 99% dealing with set-in-stone-schemas, so this whole time I’ve been testing the domain implementation but it doesn’t matter because it never changed.

    At some point you do have to test the implementation (you need to make sure the behavior is reflected when persisting) and that is kind of funky if you don’t have public properties.

    Lastly, it makes the domain model suck for views, which, granted, is the point. But for a simple app, its annoying :) . I don’t know a way around this without either (a) recreating models just for views, (b) making public properties but refraining from using them (or having all commands use interfaces and queries use the concrete type) or (c) having a entity.GetDTO() (or something) method that returns the state for views.

    • David Graham

      A 6 year old post, but this is exactly where I find myself today, wondering what to do. These options you mention in the last paragraph are exactly what I’ve been wrestling over, trying to see which approach is best. And to make the decision process worse, there are experienced people all over the internet with different approaches.

  • In a Command/Query paradigm, you don’t want to query your domain objects. Instead, you want to query the data store directly using something like NHibernate’s Projection capabilities. In this way, you pull back screen-based views.

    Once the user has made changes, you batch up the changes and send them off to the write-only domain which makes decisions about the changes and saves them to the repository.

    In this way, you don’t need to support complex queries in your repository–you simply have a GetByID() method to pull back the aggregate root.

    • David Graham

      I think this is exactly how I’m thinking, and I think it applies the concept of CQRS. I have “queries” (Q in CQRS) that are objects that I use for screens, where I just query the database for some information. Then I have “repositories” (C in CQRS) that I tend to match up with entities that house all the database querying that I need for working with entities. Let call these repositories “entity aware queries or domain queries”. These would be read queries that hydrate entities (so I can change them and persist them) and write queries that let me persist them. The key for me is that I don’t believe I have to stuff all these domain-aware queries into one class/file and call it a Repository. I can instead separate them out into any number of classes/files I want. I would however tend to keep them all under one folder (ie. Domain/Foo/Repository/ folder) so that I know these queries are targeted for a specific entity. After I’ve said all of this, Jimmy’s post here does make me think of what the need is for me to abstract the domain-queries away from the domain entity methods. I’m thinking on the domain side (C in CQRS) there might NOT be a need to abstract away the database operations from the entity methods. Testing will be slower (since the option of an in-memory replacement of a repository goes away), but I think I’m okay with testing using a real database anyways (just a test database that has much less data in it, so it can be fired up/down quickly). If there becomes too much code in the entity (too many methods with too much code), then I can simply pull this code out into domain services. I can keep these domain services in a folder with the entity, so they are close. The entity ultimately becomes just a class that holds basic behavior. All other advanced behaviors get their own domain service placed in the folder next to the entity class file. We do this already in DDD when we find a behavior doesn’t quite fit in with an entity class.

  • Eyston

    I hadn’t thought of using Projections. Thanks for that insight — it makes a lot of sense. Duh :) . I had been trying to think of ways to keep using NH for Queries without compromising my domain model but sharing the same store (I don’t do cool enough work to justify separate stores yet).

    I have been working the other end though (screen -> events(commands) -> domain).

    Greg Young talked about his method and the pattern he uses before: http://martinfowler.com/eaaDev/EventSourcing.html. This goes beyond simple command object though as he persists the transactions, not the state. Kinda cool stuff.

    One area I’m still troubled with is that I don’t do real DDD — my domain has the ERP system writing to it and doing much of the creation. I just interface to that store. So when I want to test my domain behavior under some entity state, my program actually doesn’t have the ability to put the entity into the right state because a different program is responsible for that role. This isn’t terrible when everything is public properties, but for some reason I’m trying to avoid having everything public — maybe I’ll just compromise on that ideal :) .

    Putting interfaces on the entities does help though — you can keep properties for testing but everything else works via the interface.

  • I still have repositories. I know Ayende has stated that he doesn’t use them anymore and just uses the ISession. I was on a project that used Linq2Sql and didn’t have repositories, and I thought it worked fine. But I still like being able to hide away the ORM-related stuff.

    To me, a Repository is a class that doesn’t have any business logic but encapsulates everything needed to talk to the database (including the ORM-specific code). It’s not any extra work to have it around, and there are times when it’s kinda nice to have it.

    Sometimes I have custom code in my repositories that looks like this (especially with NHibernate):

    public class ProductRepository
    public Product Get(int id, bool includeOrders, bool includeSomethingElse)
    // do NHibernate eager loading here

    In this case, I can keep the NHibernate stuff out of my domain model, so to me that is a win.

  • A repository isn’t just a named query container. I think the most important role it plays is as an inward facing facade to the domain from the persistence layer. It is written in the semantics of the domain, not the database, ORM tool, or other underlying persistence infrastructure.

    I’m using LINQ with NHibernate with repositories and workspaces and its working out great. Each part has a specific important role to play. Without it, the responsibility would be blurred amongst the others.

  • Ryan Magnusson

    Thanks Brian for bringing that point up. Although the Repository pattern isn’t the best solution for every project, one major benefit is that it is the API layer to the business, in the language of the business, on how to get data back. That’s why it’s “interface” is kept with the domain. This allows the API of any persistence layers to be written in a more specific language required to it’s domain of CRUD’ing. And if no extra code semantics is needed, what’s the actual cost — an extra file or two and a few more lines of code?

  • Ken

    >>Trend towards a Generic Method Repository, where there is no custom logic for CRUD
    Of course, the whole idea of a repository is that it just mirrors your object state. You shouldn’t even update the “DateModified” field in there. By putting code in there, you take a shortcut and break your architecture… you now need a two-tier respository (ridiculous idea, right?)

  • Pingback: Limiting your abstractions | Jimmy Bogard's Blog()