Isolating database data in integration tests

Databases in tests is an annoying, yet necessary component if you truly want to create accurate integration tests. I’m not too much of a fan of employing alternate databases in tests (SQLite in place of SQL Server), simply because I don’t trust that my tests will pass in a production environment. However, I have seen folks use SQLite locally and SQL Server on the server. I’m just not a fan personally.

Assuming we go with a common database technology for our tests, we typically hit a problem that our database is shared between tests, leading to failures in our tests because of data existing from previous tests.

So what are some ways to isolate databases in integration tests? Let’s first set some guiding principles:

  1. Developers should be using isolated databases for development and testing
  2. Tests should be isolated in the data they create and query from other tests

If I want reliable tests, I need to make sure that we’re not employing any sort of shared database amongst our developers and that one test’s data is completely isolated from other tests.

Once we have some ground rules, we can look at a few options for dealing with databases in our tests. Namely, how do we isolate data between tests?

Rollback transactions

One popular method is to simply create a transaction at the beginning of a test and roll it back at the end of the test:

private ITransaction _tx;

[SetUp]
public void SetUp()
{
    _tx = connection.BeginTransaction();
}

[TearDown]
public void TearDown()
{
    _tx.RollBack();
}

In the above pseudo-code, we just make sure we roll back our transaction at the end of the test. We’ll also need to make sure that all of the code executed in our test actually uses this transaction, but that’s really up to your environment how that transaction gets disseminated to your fixtures and so on.

In our systems, we rely on dependency injection and child containers to inject our transaction (or unit of work, whatever we’re using for transaction management).

This works well for a lot of cases, and it is quite effective at isolating changes between tests. However, if you’re relying on a committed transaction to tell you changes succeeded, it’s not a good fit. For example, uniqueness constraints wouldn’t get caught until the committing of the transaction. But, if your set up is simple, this is a good route.

Drop and recreate database

Another option is to have the database dropped and re-created between each test. This one’s a bit trickier, but not too bad to manage. If you’re already doing database migrations (you ARE doing database migrations, RIGHT?!?!?!) then it’s not too bad to just blast through the scripts to recreate the DB each time around:

[SetUp]
public void SetUp()
{
    DatabaseMigrations.Reset();
}

[TearDown]
public void TearDown()
{
}

The upside is that you’re wiping the slate clean each time, so you have an absolute known begin state for each test.

The downside is that it’s dog slow. You could also look at using script creation from your favorite ORM, if available. NHibernate, for example, can spit out creation scripts that you can just re-run at the beginning of each test. But even with this, it’s pretty slow.

Delete all data

An option I tend to like best is just to simply delete all the data. Instead of dropping tables (slow), you can just delete data from tables. I’ve seen people only delete tables they’re interested in, but that can be tricky to manage. I like to just delete from all tables instead.

Now there’s a couple of ways we could go about this. What people typically run into here are foreign key constriants. I can’t just delete data in tables in any old order, I need to be smart about it.

We could do something like:

  • Disable all constraints
  • Delete all tables (in any order)
  • Re-enable all constraints

This will work, but it’s 3 times slower than if I just happened to know the order of the tables to delete. But what if we could just know the right order to delete? If I had a list to maintain, that would also not be too fun, but luckily, it’s not too difficult just to figure out the order by examining SQL metadata:

public class DatabaseDeleter
{
    private readonly ISessionFactory _configuration;
    private static readonly string[] _ignoredTables = new[] { "sysdiagrams", "usd_AppliedDatabaseScript" };
    private static string[] _tablesToDelete;
    private static string _deleteSql;
    private static object _lockObj = new object();
    private static bool _initialized;

    public DatabaseDeleter(ISessionFactory sessionSource)
    {
        _configuration = sessionSource;

        BuildDeleteTables();
    }

    private class Relationship
    {
        public string PrimaryKeyTable { get; private set; }
        public string ForeignKeyTable { get; private set; }
    }

    public virtual void DeleteAllData()
    {
        ISession session = _configuration.OpenSession();

        using (IDbCommand command = session.Connection.CreateCommand())
        {
            command.CommandText = _deleteSql;
            command.ExecuteNonQuery();
        }
    }

    public static string[] GetTables()
    {
        return _tablesToDelete;
    }

    private void BuildDeleteTables()
    {
        if (!_initialized)
        {
            lock (_lockObj)
            {
                if (!_initialized)
                {
                    ISession session = _configuration.OpenSession();

                    var allTables = GetAllTables(session);

                    var allRelationships = GetRelationships(session);

                    _tablesToDelete = BuildTableList(allTables, allRelationships);

                    _deleteSql = BuildTableSql(_tablesToDelete);

                    _initialized = true;
                }
            }
        }
    }

    private static string BuildTableSql(IEnumerable<string> tablesToDelete)
    {
        string completeQuery = "";
        foreach (var tableName in tablesToDelete)
        {
            completeQuery += String.Format("delete from [{0}];", tableName);
        }
        return completeQuery;
    }

    private static string[] BuildTableList(ICollection<string> allTables, ICollection<Relationship> allRelationships)
    {
        var tablesToDelete = new List<string>();

        while (allTables.Any())
        {
            var leafTables = allTables.Except(allRelationships.Select(rel => rel.PrimaryKeyTable)).ToArray();

            tablesToDelete.AddRange(leafTables);

            foreach (var leafTable in leafTables)
            {
                allTables.Remove(leafTable);
                var relToRemove = allRelationships.Where(rel => rel.ForeignKeyTable == leafTable).ToArray();
                foreach (var rel in relToRemove)
                {
                    allRelationships.Remove(rel);
                }
            }
        }

        return tablesToDelete.ToArray();
    }

    private static IList<Relationship> GetRelationships(ISession session)
    {
        var otherquery = session.CreateSQLQuery(
        @"select
			so_pk.name as PrimaryKeyTable
		,   so_fk.name as ForeignKeyTable
		from
			sysforeignkeys sfk
				inner join sysobjects so_pk on sfk.rkeyid = so_pk.id
				inner join sysobjects so_fk on sfk.fkeyid = so_fk.id
		order by
			so_pk.name
		,   so_fk.name");

        return otherquery.SetResultTransformer(Transformers.AliasToBean<Relationship>()).List<Relationship>();
    }

    private static IList<string> GetAllTables(ISession session)
    {
        var query = session.CreateSQLQuery("select name from sys.tables");

        return query.List<string>().Except(_ignoredTables).ToList();
    }
}

Assuming I’m using NHibernate (only to get a SQL connection and execute scripts), we query to get the list of tables and list of foreign keys. Based on this graph, we just order our deletion in terms of grabbing leaf nodes first, removing them from of foreign keys, and repeat ad nauseum until we’ve eliminated all the tables.

I can’t remember if this works for self-referencing tables or not, those might need a special script to do the disable/enable constraint business.

The advantage to this approach is that it works regardless of your ORM – it goes straight against SQL metadata, not ORM metadata. If you have tables outside your ORM, this will pick them up, too. And it’s fast – only one query per table, and the script generation is only done once and cached for the remainder of the tests.

These are the 3 different approaches I’ve taken for effectively isolating database test data. What other strategies have you used?

Related Articles:

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Jimmy Bogard

I'm a technical architect with Headspring in Austin, TX. I focus on DDD, distributed systems, and any other acronym-centric design/architecture/methodology. I created AutoMapper and am a co-author of the ASP.NET MVC in Action books.
This entry was posted in Testing. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • kamranayub

    For one of my projects, I use two databases, one test and one with real staging data. Reason is because I don’t want to delete all the data I’m working with normally in staging.

    I seed the test database with test data I test against, then drop/recreate it. Slower but not slow enough for me to be frustrated.

    And yes, Migrations is excellent.

    • Guest

      I’m using the rebuild database from scratch before the tests start and then the delete all approach. But is more interesting to understand what is the strategy to setup the fixture. I’m currently using the same API of the system in order to put the database in the right state and than performe the action. I’ve found this way the best one for maintenance purpose (is not dependent on the database schema and is easily refactorable) but I’ve find it sometimes slow. What’s your preferred way?

      • jbogard

        What’s slow about it?

  • Charlie Knoll

    I’ve used the same approach in a manually maintained script only using non-logged transactions (truncate, bcp). Thanks for the article.

  • http://colinbowern.com Colin Bowern

    I use a combination of FluentMigrator to roll the database with a target of SQL Server LocalDB and then NDbUnit to manage the data sets. Works like a charm. https://github.com/colinbowern/Presentations/tree/master/BuildingBetterIntegrationTests

  • http://twitter.com/octoberclub Michael Steele

    I also think its worth separating your tests into in memory unit tests that are run very frequently, slightly slower lightweight integration tests that use the rollback pattern on a db that is created on first run, and finally acceptance tests that may create the whole db populated by data. Acceptance tests can be run less frequently and may have drawbacks in that they can be brittle due to the nature of their dependencies.

  • http://twitter.com/octoberclub Michael Steele

    Another strategy? Instead of SetUp/Teardown methods use an ActionAttribute http://nunit.org/index.php?p=actionAttributes&r=2.6.1

  • Joseph Poirier

    Hi Jimmy, thx for the Excel upload idea after the Toledo meeting. I misplaced your card and I have a question. How do you do an import to SQL Server from Excel directly from a SqlDataSource referencing columns by number, not name?

    • jbogard

      Hmmm good question – can you send me a note at jimmy at headspring.com? I’ll send you details there.

  • derekgreer

    Most of my specs these days fall into the category of either component/unit or full acceptance tests, but acceptance and integration tests have the same set up needs.

    For the specs I’ve been writing, I typically delete all the data in the database within the context setup for the spec using a XYZDatabase.Clean() method. I haven’t thought of separating my development environment into a test database and an “F5″ database, but I’m not sure I see much value in that if you’re tests take responsibility for their own context setup. I often find it necessary to run the app using data created by the specs to help diagnose why a spec isn’t passing yet or to do further UI refinement on things I’m not driving out with tests (animation, details, etc.)

    • jbogard

      I don’t do this all the time – but it’s helpful to have a separate test vs. dev database if setting up “dummy data” for running locally is a pain. I sometimes have systems that require a bit of base data for it to make sense running the app with any sort of regularity.

  • Daniel Marbach

    Hy jimmy
    I heavily used SqlCompact instead of SqlServer. We created the schema with Nhibernate and connected to that databse. To make it faster I did the schema deployment once and then copied the template database for each test and dynamically changed the connection string. This is pretty fast. The downside is that you are lmited in production to queries which the sqlcompact versions supports. This can work OK but can also be a huge downside. Everything is a tradeoff. But thanks for your post!

    Daniel

  • http://twitter.com/optiks Michael Chandler

    Database snapshots also work well.

  • http://twitter.com/NathanGloyn Nathan Gloyn

    I use an approach similar to your delete data method.

    The db starts in a known state and then tests are responsible for altering that state if necessary to run and then cleaning up after themselves after running.

    I use sql scripts to encapsulate the changes and then the test can call for a specific script to be executed. You can also have global scripts which reset the state.

  • Pingback: Isolating database data in integration tests | Jimmy Bogard's Blog | News de la semaine .net | Scoop.it

  • http://twitter.com/DaveSussman DaveSussman

    I’ve done something similar in one project, where I have both unit tests and integration tests for the data repository. I use Simple.Data for the tests, either using the in-memory adapter (for unit tests) or the real adapter (for integration tests); same set of base tests, with the integration ones wrapped with transactional attributes (xUnit). All the setup/teardown of data is done within the test suite itself, not externally via scripts.

    Essentially the tests are duplicated and in hindsight I’d abstract and have the decision as to which adapter to use set by attributes too, or via some external switch.

  • joshuaflanagan

    One could make that script even more portable by using the ANSI standard INFORMATION_SCHEMA instead of SQL Server sys* tables.

    • jbogard

      Yes, one could. One could.

  • Mickael lecoq

    Dbunit + unitils

  • Jaime Metcher

    I like to rely on a teardown method that actually works. If I can’t write a reliable teardown, I probably don’t understand what I’m doing, or my domain model isn’t doing its job. Either way I want to know about it sooner rather than later. For the same reason I like to use a crusty old production database (copied, natch) with years of cruddy data in it. Again, if I expect my code to handle all the weird stuff that happens in production, why not make test just as weird.

    Yes, I also use the transaction and drop/recreate methods, but usually not after every test.

  • Andrei Rinea

    Nice article! At the moment I am going with the “delete from all tables in the correct order” strategy. The downside is, as you said, that I need to know the tables and their relationships. But I have to know them anyway since I am building the system, so no biggie there..

  • Andrei Rinea

    One more thing : I, too, am against using one (R)DBMS in testing (SQLite) and another type in production (SQL Server – let’s say Standard or Enterprise) only that in your example I am expecting things the other way around : test would fail but tested code would succeed in production since SQLite is more restrictive than SQL Server.

  • TheCloudlessSky

    If you’re using NHibernate, you can wrap your tests with TransactionScope and not call Complete(). This way, you can still call Commit() on your inner transactions. I use this method with xunit.net and create a base DbTest in which all my tests inherit. Therefore, I don’t need to worry about the TransactionScope in the actual test code.

  • Liang

    I recreate database every time when doing the integration test. We can simply add task in MSBuild to achieve that (I use cruise control plus MsBuild on the server). On my dev machine, I use Msbuild script to build/compile, other than Visual Studio. It runs fast.

  • Bill Turner

    So why not HSQLDB (or equivalent), DbUnit, Unitils and, if appropriate, spring-test-dbunit? You CI server should have it’s own DB, the same version as PROD. Using an in-memory db solves a lot of pain.

  • http://twitter.com/rippo Richard Wilde

    Instead of deleting tables, why not restore the whole database to a known state? Would this not be quicker?

    • jbogard

      Not if the schema was changing. And it also depends on the # of tables, we’ve found. We had to play with several different ways of doing this to find the right fit for the schema we were working with.

      • http://twitter.com/rippo Richard Wilde

        Good point it makes sense, in dev land schema’s are constantly changing

  • gorlok

    I prefer in-memory db. It works most of the time. Store procedures are an exception: SP are tested in a real (shared) test-database.

  • roberto.delloglio

    I’m using the rebuild database from scratch before the tests start and then the delete all approach. But is more interesting to understand what is the strategy to setup the fixture. I’m currently using the same API of the system in order to put the database in the right state and than performe the action. I’ve found this way the best one for maintenance purpose (is not dependent on the database schema and is easily refactorable) but I’ve find it sometimes slow. What’s your preferred way?

  • Pingback: Name of the blog | Using a "data builder" class for isolating database data in individual tests

  • Pingback: Isolating the Database When Testing | Chuck Conway

  • Igor Korolenko

    I use Effort library to load data from csv files, schema from Entity Framework and create database in memory for the duration of integration tests class. This way all integration tests can run on shared build server without installing db server on it.