ORM techniques for legacy databases

One of the reasons folks typically go with a hand-rolled ORM (i.e., using ADO.NET by hand) is the assumption that ORMs don’t work well with legacy databases or databases designed in isolation from any sort of object model used to interact with that database.

From my experience, I have yet to see a legacy database that can’t use an off-the-shelf ORM as its mapping layer. And I’ve seen some craaaaaazy databases. Fixed width columns. Character data types representing monetary values. A junction table whose foreign keys to the other two tables were joined together in ONE COLUMN. And in every case, I’ve been able to successfully use an off-the-shelf ORM to encapsulate our access to whatever was underneath the covers.

But (and this is a big BUT), I had to use different techniques than I normally would against a greenfield database where the database more or less matched my underlying database schema.

Situations and examples can vary, so I’ll walk through some of the most heinous insanity I’ve found.

Sprocs, sprocs everywhere

In organizations that mandate stored procedures . The only really valid case being for security, and only that because they’ve decided to enforce security at the database layer, which it turns out falls down fairly quickly in chained-trust scenarios etc. However, it’s likely that you can’t change that any time soon. Or, the application was developed before ORMs were fairly full-featured and stable (before 2004 or so).

No worries! You still don’t need to do crazy IDataReader/SqlCommand jockeying. Both NHibernate and micro-ORMs easily support stored procedures for executing SELECTs and INSERT/UPDATE/DELETEs. Micro ORMs provide a thinner layer to work with, so I typically use those. Here’s an example of using PetaPoco:

var databaseContext = new PetaPoco.Database("SomeConnectionString");
return databaseContext.Query<Product>("exec spGetProducts");

Compare with ADO.NET, you’d have to create a SqlCommand, execute a data reader, loop through the results…well, you get the idea.

For the mutating DML statements, it really depends on what you want. If you want the full features of an ORM, like change tracking, identity maps etc., then you can configure the ORM to use stored procedures instead of generated SQL (and there is no performance benefit of doing one over the other).

You don’t have to use one way for one, either. For entity-based interactions, you can use the entity-based features of NHibernate. For report/adhoc-queries, you can use that part.

Mainframe conversion wizard fiascos

The craziest database I’ve seen was one where it looked like someone used some sort of conversion wizard to convert an old AS400 mainframe database into SQL Server – as-is. Pretty terrible. The things we saw here included:

  • Multi-thousand-line stored procedures
  • Fixed-width columns
  • Very loose types (CHARs for everything, dates, numbers, everything)
  • Compound keys compounded together

For stored procedures, that was pretty easy. Most ORMs support using sprocs, so it was really just a matter of using those. If not that, most ORMs have much nicer wrappers around ADO.NET than the quite difficult to use SqlCommand and the like.

When we had problems with the underlying data types not being what we wanted, or just shaped horribly, we used two different techniques. The first technique is to use the ORM’s hooks for coercing data types back and forth. For example, it’s fairly trivial in NHibernate to configure a column to be stored as a CHAR(10) but read as a decimal. This lets your object model have a sane look, and the ORM layer take care of complexities of the underlying DB schema

If that doesn’t work, we instead resort to SQL views on the underlying database tables to coerce data accordingly. Even in cases where foreign keys were jammed together into one column with the primary key, we used an indexed view to efficiently keep a sane look to the underlying store.

Screaming naming conventions

Esoteric naming conventions are another sore spot when dealing with legacy databases. In many older databases and Oracle, column names could only be a certain length or were forced to be uppercase, or both.

Luckily, many ORMs are designed for this exact scenario. We can use the ORM to configure entity-based mappings to have particular column/table names.

Alternatively, you can use the view trick to sanitize names of things, and even allow updates back to the source tables. If you’re just reading data, you can always craft your SQL to alias columns to have more sane names. We would then just use the SQL –> DTO features of the ORM at that point.

SQL-specific Features

Some things exist only in the database and don’t really get surfaced to your ORM layer. In those cases, we just don’t really care that they exist, it doesn’t affect our ORM. It might affect our strategy for building a query, but building queries is mostly orthogonal to building entity-based interactions. Things that don’t really matter inside your ORM include:

  • Index creation
  • Triggers
  • Security
  • SQL query optimization

ORMs have a bit of a stigma to them in that many folks assume that the purpose of an ORM is to hide SQL. It’s not – it’s only meant to encapsulate mapping of SQL to objects. All the SQL-specific features of databases are still important to know and learn and apply, but they’re just outside the scope of ORMs. ORMs are about mapping, but you’re still responsible for the SQL being used, auto-generated or not.

Unsupported databases

This one can be tough. I’ve never really needed to access a database that doesn’t have out-of-the-box support from some ORM on the market. But most ORMs have hooks to support multiple databases. EF for example supports several major databases out of the box, with abstractions defined to support things like SQLite, PostgreSQL, MySQL and others. All you have to do is look.

If there really is no option, well, I think you probably have more problems than what ORM to use. But I’d love to hear about those cases!

Making the informed decision

Although I haven’t had to use an ORM against an unsupported database, I used them against some of the wackiest databases I’ve heard people had to use. Any all cases, I was successfully able to use an ORM to map.

But was it easier than rolling my own? In all cases – yes. ADO.NET code is hard to get right, and I see over and over again people getting it wrong. Don’t chance it – let the tool do what it does best and focus on your business problem at hand. If you’re having trouble with one avenue, try going another. ORMs support a variety of different approaches, and not every approach is right for every scenario.

But a good understanding of what options you have will help you figure out what direction to go. ORMs can be complex, as it’s a tough problem. Dealing with a legacy database is rarely an easy problem, but ORMs can provide the support needed to let you focus on what’s important.

About Jimmy Bogard

I'm a technical architect with Headspring in Austin, TX. I focus on DDD, distributed systems, and any other acronym-centric design/architecture/methodology. I created AutoMapper and am a co-author of the ASP.NET MVC in Action books.
This entry was posted in Architecture, NHibernate, SQL. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Pedro Reys

    What about database where the schema changes at runtime? ;)

  • Florim Maxhuni

    Can you please explain what you mean with 
    Fixed-width columns (is it wrong to have this)?

    • Anonymous

      No, it’s not wrong, when it’s the right way to model the data.

      Things like names of people modeled as CHAR(30). It would have a ton of spaces at the end.

  • Rookian

    What about sql tables (meta database) that define the the schema for other sql tables? (concrete database)? :)

    • Anonymous

      Totally forgot about this one. I need to write a followup post, “When to write your own ORM”

      • Rookian

        I just wanted to say that there are really crazy things.

    • James Banner

      Right that is totally the common use case /s

      • Rookian

        Ofc it is not commom, but I know such a case and I really ask myself why someone had this stupid idea :o

  • What do you think about a scenario when developer needs to pull data from some finance DB with crappy structure and horrible table/column names? Do I really need an ORM here, cause I think I’m comfortable with plain old SqlAdapter?

  • Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1155()

  • Dave Van den Eynde

    One case that I remember is where I had to access an AS400 database using a very expensive ODBC driver directly. This was no migration. I don’t even think it was “real” SQL. Perhaps it wasn’t even relational. It didn’t even involve .NET but I can imagine the type of code that I would have to write to get it done today.

    I agree that lightweight ORMs can help in almost any case where one would be writing a lot of code just to materialize stuff from a data reader, but I still would not be inclined to state that *all* situations demand the use of such an ORM. Also, I think that lightweight ORMs don’t really do much more than simply wrap said code, whereas in my book an ORM hides the data access layer completely to encourage portability. By your definition, an ORM is still an abstraction layer on top of a data reader, and in my experience, all abstractions leak.

    Whether or not to use an ORM and which one is a decision that must be balanced by the effort to put into putting that ORM to use and the effort to hand roll your own. Obviously the more diverse ORMs become the more weight is stacked against rolling your own, but it’s still a balance that must be weighed, no matter what the project brings or will bring.

  • Tudor

    One case where many O/RM probably don’t offer support is: a database with a stored procedure, that returns multiple result sets, each with it’s own schema (multiple sets of tables), in a single call.

    Another case: databases with stored procedures that each returns one or more huge XMLs for each row, instead of separate columns – usually an O/RM is designed for a tabular data structure. Also when an update is performed, the object has to be serialized using a special XML schema, passed to the stored procedure, that will validate it, apply some XSLT transformation and then store it in several tables.
    Both of these are not some legacy systems designed 20 years ago, but new projects started 2-3 years ago. Not some esoteric RDBMS but SQL Server or Oracle.

  • Sparty

    What about a database required to be accessed by a web service of some sort that returns dataset for reads but custom web service commands for create, update and delete?

  • The usual reason you can’t do such a thing is because the other team members don’t know or understand ORMs and therefore don’t want to risk it when they can just use a sproc.  Or you can’t get management buy-in because “it works”.  I was just fired from a job because among other best practices I was trying to introduce the team to some kind of ORM so we didn’t have DataSets and Sprocs all over the place.

    • Tudor

      Any decent O/RM can use stored procedures, if they are not too exotic..

      • Right, but that doesn’t mean ignorant co-workers will want to use an ORM ;-)  That was the problem I ran into; it was more of a “What’s this fancy ORM thing?  SqlConnection/Command works just fine” sort of thing.  Although most of our ORMs had return codes, which seemed to be kinda tricky for an ORM to handle.

        • I of course mean “Most of our sprocs had return codes”

        • Tudor

          Well, if you have co-workers like that, that’s another (serious) problem.. :) O/RMs were commonplace in .NET 7 years ago (2005) , so by now any developer should be able to choose to use them or not, depending on the projects specifics..

  • What about sprocs that return multiple recordsets?  blech!!  What about sprocs that return xml columns?   

  • Soulfiremage

    I’ve one for you.

    A product that uses either access 2007 or SQL server 2005. It must be in one dll and eventually does a form of remote access with a transact SQL batch procedure. The product is in 100 plus sites as access and any one of them may be upgraded to SQL server at any point. Worse there is a content database that must remain local and as an access database. So it always uses two databases whether as a multiuser setup or not.

    The content is heavily procedurally accessed at random so it is thought it would be too slow as another single instant on SQL.

    So any ORM cope with that or even just Access?

    I know a dozen reasons not to use access but for at least 2 years there is no option.

  • Peter

    We’ve hand-rolled an ORM at my workplace because we use a legacy database unsupported by every ORM I’ve come across — Interbase. We also have loose types (strings for dates, bools), crazy primary keys (compound keys made up of dates, strings, doubles(!)), screaming, fixed-length names and no foreign keys, across about 300 tables. Naming conventions of the past have included classic blunders like tables named ‘FOO’ and ‘FOOS’, and tables named ‘BAR’ and ‘BAR_’. It’s a painful experience to say the least.

  • Tudor

    As far as I know, Hibernate (Java) supports Interbase (from Borland/Embarcadero).

  • Bryan Gerrells

    Hey Jimmy, I know this is an old post, but what about a sql database without any relationships/foreign keys defined? I have around 70 copies of the same sql database structure currently using a clarion gui. I’m rewriting it to c# and thought about using Entity Framework, but I’m struggling without FK’s.

    • jbogard

      I would basically do the same idea, except manually come up with the list of tables. That’s what I did before the FK trick.