Separation of Concerns by example: Part 3

We made quite a bit of progress separating out the concerns in Part 2, but there are still some issues with our current design.  Other parts in this series include:

To review, here’s our CustomerFinder so far, after refactoring away from the static class and creating the specialized ICustomCache interface:

public class CustomerFinder
{
    private readonly ICustomCache _customCache;

    public CustomerFinder(ICustomCache customCache)
    {
        _customCache = customCache;
    }

    public List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
    {
        List<Customer> customers = null;
        string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;

        if (_customCache.Contains(key))
        {
            customers = _customCache.GetItem<List<Customer>>(key);
        }
        else
        {
            customers =
                (
                    from
                        c in DataGateway.Context.Customers
                    orderby
                        c.CustomerID descending
                    select
                        c
                ).Skip(startRowIndex).Take(maximumRows).ToList();

            if ((customers != null) && (customers.Count > 0))
                _customCache.AddItemExpringTomorrow(key, customers);
        }

        return customers;
    }

}

Although we refactored away from the HttpContext.Current.Cache dependency, we still have one nagging issue: the DataGateway.Context dependency.  The DataGateway.Context class is just a wrapper around LINQ to SQL:

public class DataGateway
{
    public static NorthwindDataContext Context
    {
        get
        {
            return new NorthwindDataContext("");
        }
    }
}

If we created a unit test against CustomerFinder as-is, it would run, but only if I had access to the database.  Since the unit test calls out-of-process, it’s no longer a unit test.  It’s just a regular automated test, running in a unit test framework.

Our goal is not only to increase testability, but readability and usability as well.  Encapsulation is great for information hiding, but not dependency hiding.  We’d like to be as explicit as possible, telling users of the CustomerFinder exactly what is needed to get it to work properly.  We want to develop not only for users of the application, but future maintainers of the system as well.

We have quite a few options to extract the DataGateway dependency, each of which has its benefits and issues.

Designing the dependency

This dependency looks like a plain static dependency at first, but on closer inspection, it looks like our other HttpContext.Current.Cache dependency, where we’re poking deep inside a static property to get the “real” dependency, the Customers property.

Given the same arguments as the previous part in this series, it looks like we’ll choose “create a specialized interface”.  But what should the interface look like?  What should its responsibilities be?

One really interesting thing to look at is paging.  Should our extracted dependency be responsible for paging, or should it just return everything?  What should the return type be, still List<T>?  We could make our Application layer responsible for paging, and the Data Access layer be paging-ignorant.  After all, paging is dependent on who is looking at the data, some parts of the application might care less about it.  Paging is something that matters strictly to the user interface.

However, we don’t want to sacrifice performance at the sake of our lofty goals.  What if there are something like 100K products, or a million?  That can be a typical number on some established systems.  We don’t want to pull back a million records if only ten are going to be displayed.

To balance these concerns, we’ll need to take a closer look on the existing behavior to determine the right way to go.

A peek behind the curtains

First, I’d like to examine what happens in the existing system.  To do so, I created a simple test:

[Test]
public void Check_out_the_profiling()
{
    var finder = new CustomerFinder(new FakeCache());
    List<Customer> customers = finder.FindAllCustomers(1, 10);
    customers.Count.ShouldBeGreaterThan(0);
}

private class FakeCache : ICustomCache
{
    public void AddItemExpringTomorrow<T>(string key, T item)
    {
        
    }

    public bool Contains(string key)
    {
        return false;
    }

    public T GetItem<T>(string key)
    {
        return default(T);
    }
}

I just used a fake cache, that’s always empty to force the CustomerFinder to go against the DataContext.  I purposefully get back the second page (it’s zero-based indexed), to see if the Skip and Take are smart enough in LINQ to SQL.  Here’s what came back from the profiler:

exec sp_executesql N'SELECT [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t0].[CustomerID] DESC) AS [ROW_NUMBER], [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
    FROM [dbo].[Customers] AS [t0]
    ) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]',N'@p0 int,@p1 int',@p0=1,@p1=10

Now this is VERY interesting.  LINQ to SQL is smart enough to turn the Skip and Take into actual server-side paging.  Looking at the query, I wouldn’t really want to write this query, but it works very well.  Which reminds me:

Stop writing custom data access code.  This is a problem that has been solved.

I’ve had to write custom dynamic paging logic, and it was not fun.  Paging gets very, very complicated when doing sub queries, arbitrary sorting, filtering and grouping.

Anyway, it looks like paging works well on the database side, so we’d like to preserve that behavior.  But we’d also like the paging logic to be where we want, in the Application Layer.  One feature of LINQ will help us out on this side – deferred execution.  Remember, the SQL isn’t actually executed until you iterate over the results, which happens with the .ToList() method call.  It will also happen in a foreach loop or the ToArray() method call.

So it looks like we’ll be able to have the best of both worlds.

Extracting our dependency

Although it was a long discussion, this is something that usually happens in our teams pretty regularly.  After extracting lots of dependencies, or doing TDD for some time, you start to get a good feel for how the behavior should be laid out.

First, we’ll need to Extract Method on the LINQ query part, except for the paging and ToList() part:

public List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
{
    List<Customer> customers = null;
    string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;

    if (_customCache.Contains(key))
    {
        customers = _customCache.GetItem<List<Customer>>(key);
    }
    else
    {
        customers =
            FindAllCustomers()
                .Skip(startRowIndex)
                .Take(maximumRows)
                .ToList();

        if ((customers != null) && (customers.Count > 0))
            _customCache.AddItemExpringTomorrow(key, customers);
    }

    return customers;
}

private IEnumerable<Customer> FindAllCustomers()
{
    return from
               c in DataGateway.Context.Customers
           orderby
               c.CustomerID descending
           select
               c;
}

Our extracted method only returns IEnumerable<T>, and not List<T>.  Returning List<T> before paging means we’d lose the server-side paging goodness that LINQ to SQL provides.

Another benefit to returning IEnumerable<T> is that it better communicates the intent of our code: we’re giving you a read-only collection of customers, don’t even try to change it.

Next, we’ll perform Extract Class on the FindAllCustomers method we just extracted to a new class, a CustomerRepository class.  Why the Repository name?  It comes from Martin Fowler’s excellent Patterns of Enterprise Application Architecture book.  Here’s the new CustomerRepository class:

public class CustomerRepository
{
    public IEnumerable<Customer> FindAllCustomers()
    {
        return from
                   c in DataGateway.Context.Customers
               orderby
                   c.CustomerID descending
               select
                   c;
    }
}

For those that read part 2, our next steps should seem familiar.  We’ll now use Extract Interface, so that our CustomerFinder class only deals with the interface and not the concrete class.  This will allow us to swap out implementations for testing or debugging purposes.  Our new interface will be ICustomerRepository:

public interface ICustomerRepository
{
    IEnumerable<Customer> FindAllCustomers();
}

Of course, our CustomerFinder class no longer compiles, so we’ll change the class to get the ICustomerRepository from the constructor through Constructor Injection.  The FindAllCustomers method that uses paging will use this instance instead:

public class CustomerFinder
{
    private readonly ICustomerRepository _customerRepository;
    private readonly ICustomCache _customCache;

    public CustomerFinder(ICustomerRepository customerRepository, ICustomCache customCache)
    {
        _customerRepository = customerRepository;
        _customCache = customCache;
    }

    public List<Customer> FindAllCustomers(int startRowIndex, int maximumRows)
    {
        List<Customer> customers = null;
        string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;

        if (_customCache.Contains(key))
        {
            customers = _customCache.GetItem<List<Customer>>(key);
        }
        else
        {
            customers =_customerRepository
                        .FindAllCustomers()
                        .Skip(startRowIndex)
                        .Take(maximumRows)
                        .ToList();

            if ((customers != null) && (customers.Count > 0))
                _customCache.AddItemExpringTomorrow(key, customers);
        }

        return customers;
    }

}

One annoying thing left is that the return value of the FindAllCustomers betrays its intent: a read-only view of a single page of customers.  It makes zero sense that a user of this class will add new customers to a single page of customers.  We want to be as explicit as possible in our interface, so I’m going to change the return type to an array instead of a List<T>:

public class CustomerFinder
{
    private readonly ICustomerRepository _customerRepository;
    private readonly ICustomCache _customCache;

    public CustomerFinder(ICustomerRepository customerRepository, ICustomCache customCache)
    {
        _customerRepository = customerRepository;
        _customCache = customCache;
    }

    public Customer[] FindAllCustomers(int startRowIndex, int maximumRows)
    {
        Customer[] customers = null;
        string key = "Customers_Customers_" + startRowIndex + "_" + maximumRows;

        if (_customCache.Contains(key))
        {
            customers = _customCache.GetItem<Customer[]>(key);
        }
        else
        {
            customers =_customerRepository
                        .FindAllCustomers()
                        .Skip(startRowIndex)
                        .Take(maximumRows)
                        .ToArray();

            if ((customers != null) && (customers.Length > 0))
                _customCache.AddItemExpringTomorrow(key, customers);
        }

        return customers;
    }

}

This a much clearer interface to those using the CustomerFinder, as they know exactly what the class depends on, and the Customer[] array indicates that we’re looking for a read-only, immutable collection of Customers.

Of course, our original CustomerManager class no longer compiles, so we’ll need to fix the method by changing the return type and fixing the constructor call:

public class CustomerManager
{
    [DataObjectMethod(DataObjectMethodType.Select, false)]
    public static Customer[] GetCustomers(int startRowIndex, int maximumRows)
    {
        var finder = new CustomerFinder(new CustomerRepository(), new CustomCache());
        return finder.FindAllCustomers(startRowIndex, maximumRows);
    }
}

Our ASP.NET page, with its data source control, does not need to change.  It works just as well with arrays as it does with List<T>.  Executing our original profile test with the new CustomerRepository and checking the profiler confirms that…oh wait, server-side paging is broken.  Here’s what the profiler says:

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
ORDER BY [t0].[CustomerID] DESC

So what happened?  I’m still doing deferred query execution, but now it’s forgotten how to do server side paging?  The culprit lies in our return type in ICustomerRepository.  The LINQ query actually returns IOrderedQueryable<T>, which ReSharper originally did for me before I changed it.  Bad me.

So, I’ll change ICustomerRepository.FindAllCustomers to return the IOrderedQueryable interface instead of IEnumerable:

public interface ICustomerRepository
{
    IOrderedQueryable<Customer> FindAllCustomers();
}

public class CustomerRepository : ICustomerRepository
{
    public IOrderedQueryable<Customer> FindAllCustomers()
    {
        return from
                   c in DataGateway.Context.Customers
               orderby
                   c.CustomerID descending
               select
                   c;
    }
}

The LINQ query doesn’t change, but my profiling confirms that the changes worked.  So was this a good change?  Yes.  In fact, I’d argue that the IOrderedQueryable better expresses the intent of what we’re returning.  In effect, we’re telling users of the ICustomerRepository that we’re returning a smart collection, that you can do paging and all sorts of things on it in an efficient manner.

And that’s what we’re really trying to do here, communicate the intent of our classes and interfaces to the users of these, without forcing them to look at the code in the class.  If another developer understands intent based solely on the names and signatures of our methods and types, we’ve done our job.

Quick review

To get rid of our dependency on the static DataContext class, we performed similar refactorings as we did in part 2:

  • Extract Method
  • Extract Class
  • Extract Interface

We picked another specialized interface, ICustomerRepository, to house our data access code, borrowing the name from the pattern in Fowler’s book.  We also saw that LINQ to SQL requires IOrderedQueryable to effectively take advantage of server-side paging.  In doing so, we created a very explicit interface, describing exactly what we intend to support (and nothing more).

Our CustomerFinder no longer contains any opaque dependencies.  Which is good, because in the next part, we’ll see how dependency inversion saves the day when a bug report comes in and we need to add some unit tests to ensure our bug doesn’t get un-fixed.

Related Articles:

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Jimmy Bogard

I'm a technical architect with Headspring in Austin, TX. I focus on DDD, distributed systems, and any other acronym-centric design/architecture/methodology. I created AutoMapper and am a co-author of the ASP.NET MVC in Action books.
This entry was posted in LINQ to SQL, Patterns, Refactoring. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://Bryan.ReynoldsLive.com Bryan Reynolds

    Thanks!

  • Josh

    Just wondering… how are you automatically testing your Repository methods (and profiler doesn’t count :) )?

  • http://jimmybogard.lostechies.com Jimmy Bogard

    @Josh

    What do you mean? This is legacy code, so no unit tests are currently in place.

  • randy

    I found the set of articles EXTREMELY useful in getting comfortable with writing code with separation of concerns fully addressed. It almost appears to me that it may be more productive to write the code badly at first and then refactor to code that has concerns separated. Am I wrong?

  • http://jimmybogard.lostechies.com Jimmy Bogard

    @randy

    I wouldn’t say bad necessarily, but I do operate under the assumption that I modeled it wrong the first time. And the second time probably too. It’s more of the idea of not waiting to have all the information, as the exercise of creating the model brings out information.

    Trying the model out in practice for a week or month helps show where the weak points are, what invalid assumptions you had and what you missed. It’s more of an evolution.

    But sometimes you have to wait for enough responsibilities to be floating around before you realize there is a concept there wanting to get out. I have found I’m much better at refactoring than designing (because my designs are usually wrong).

    I used to feel a little bad about having a bad model after I realized it was bad. “Why couldn’t I see that earlier?” and such. But all I can do is model what I know, not try to model what I don’t know, and create an ecosystem that’s easy to change, so when I can see the right model, I have no fear of change.

  • An Phu

    So, there is no place for ReadOnlyCollection?