Wither the Repository

11 September, 2009. It was a Friday.

Looking at the different Repository pattern implementations, one thing really surprised me – how far off these implementations are from the original Fowler definition of the Repository. Instead, we see a transformation to the examples in the Evans description of Repository. Fowler’s definition for a Repository is:

Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.

But in our original Repository pattern implementations, we see far more than Querying, but persistence as well (Create, Update and Delete). This is consistent with some of the descriptions in Evans book, where the collection-like interface also allows for modifications – adding and removing – thus completing the lifecycle of a domain object.

But is this Repository pattern needed? What exactly is the Repository buying us? In normal usage, I see the Repository pattern doing two main tasks:

Encapsulate the data mapping layer
Encapsulating difficult queries (in the Named Query Method approach)

In some systems, including the one I’m working on right now, each entity corresponds 1 to 1 with a repository implementation – whether or not the repository implementation is doing anything. Looking back at the Fowler PoEAA book, there is another interesting pattern at play – the Unit of Work pattern. The Unit of Work:

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

A common Unit of Work implementation wraps the underlying data access layer, and provides means of creating transactional boundaries:

public interface IUnitOfWork : IDisposable
{
    void Initialize();
    void Commit();
    void Rollback();
}

public interface INHibernateUnitOfWork : IUnitOfWork
{
    ISession CurrentSession { get; }
}

The interesting piece here is the ISession interface from NHibernate. It puts several key patterns together – Unit of Work, Identity Map, as well as the Data Mapper pattern. Now, I’ve never been on a project that has wanted or needed to swap out ORM solutions (provided they chose an active, mature solution), so what is this encapsulation of the Repository buying us exactly?

A bigger issue comes into play when trying to implement Command-Query Separation throughout your application – the Repository pattern just does not support it well enough. Eventually, you’ll likely be left with a class that is Repository in name only, but not the original intent. Let’s look at some of the problems of Repository to get an idea how CQS doesn’t gel well.

Loose aggregate boundaries

The problems Aggregate Roots solve is one of spaghetti-coded associations between entities, where I could never figure out the right place to draw the boundary. I originally assumed that a save operation meant that you would save a single aggregate root, and cascading would take care of the rest. Unfortunately, that simplicity and rigidity tends to fall flat in larger models. In an aggregate root, child entities only have identity within the boundaries of the root, and no outside identity.

In the web world, that would mean that anything with an Edit link is inherently an aggregate root, as it needs its own global identity to save properly. We toyed around with local identity, and always passing around the parent object to work on the child. But all that extra work didn’t buy us anything. Local identity only bought us anything when you edited an entire entity on one page, including children.

Cascading poses other issues. What we found is that cascading depends not on the entity I’m working with, but the specific operation/command I’m trying to perform. Instead, we’ve seen aggregate boundaries depend completely on the command we perform. Sometimes the root is object A, sometimes it’s B. Instead of worrying too much about aggregate boundaries, we greatly reduced the relative boundaries of roots, until it was a use-case per use-case basis that determined where we drew our boundaries.

Because roots can hold references to other roots, the concept of a boundary and local identity eventually restricted itself to basically screens where you saved the parent and children in one operation. Again, it was the C in CQS that drove boundaries.

Thus it started to become confusing how we should handle cascades and save logic. Since boundaries varied screen-to-screen, it became obvious that any custom save logic in a Repository could be handled perfectly well in the ORM layer, with cascades. Eventually, our model became extremely connected, with appropriate cascading and modification operations available only when specifically needed.

###

First-class queries

On virtually every screen our application, we show the exact same entity. That’s around 150 screens or so, where what you see begins with one Person, and filters down from there. This made things quite interesting from the query optimization standpoint. We had two choices – go through the root Person entity to all of its children, in one slice. Or, go from the children, and get back to the parent. But sometimes we show bits and pieces of various slices of our Person object, things that did not fit well with just one Repository call. We enabled lazy loading – but only because we wanted a well-connected object model, with the knowledge that we would optimize the access later.

When we actually got to the optimizations, we could either create a bunch of GetByIdForAbcScreen on our PersonRepository, or, encapsulate all of the fetching in a single Query object. Guess which route we went…

With first-class query objects in our system, the need for a true PersonRepository becomes much less. It doesn’t do any custom queries, those are now in distinct query objects, designed for each situation.

Wither the Repository?

No, not yet. Complexity in a system is never uniform, and a single solution rarely fits every circumstance. The tough part is figuring where to put custom persistence or fetching logic. New query method on a Repository? Override the Save method? Is that custom save logic on every save for this entity, or just for certain commands? Complexity resembles more the picture of cosmic background radiation from the Big Bang, almost random in nature, but a predictable and expected variation. Most spots are in the middle of the complexity scale, some spots are so easy that the architecture chosen may seem too much, but other spots that bend our architecture to its limits. The Repository is one of those spots.

Personally, I see a few trends in Repositories:

Trend towards a Generic Method Repository, where there is no custom logic for CRUD
Queries externalized and encapsulated in Query objects when needed, otherwise exposed through LINQ
Custom persistence logic externalized and encapsulated in Commands, that utilize the one Repository implementation as needed
Repository becomes relegated, in NHibernate terms, as the ISessionFacade

I’m not fully ready to chuck our custom repositories, as I’m still not quite fully cognizant of what this change might mean. But the parallel inheritance hierarchies are naive and a little difficult to work with, and don’t really reflect the reality we’re trying to model in our controller level of our application – modeling Queries and Commands as first-class citizens.

← DDD: Repository Implementation Patterns

AutoMapper 1.0 RC1 released →