In the ORM Battle, Everyone Loses.

As Ted Neward aptly pointed out in his post: ORM is the Vietnam of Computer Science(credit to Justin Etheredge for reminding me of this).  You need to do it, but there’s no real good end solution here. RDBMS do what they do very well (that is, persist things to disk and load them back up quickly and reliably).

Yet another reason (and there are many) why I feel strongly that things like DDD and PI are extremely important, and why I feel that Model-Driven Architecture and the sort of behavior that the v1 of the Entity Framework will be targeting/enabling is not a good long term strategy, is that the problems that RDBMS’ solve are not problems that we may have for much longer.

Consider this: What are the major problems that RDBMS’ solve?  I thought about this for awhile and came up with this list (which I’m not declaring exhaustive by any means):

  1. Transactional interaction
  2. Fast, reliable storage of data to a disk medium
  3. Fast, reliable retrieval of data to a disk medium
  4. Fast querying of data

Now, consider this: With RAM getting cheaper and cheaper (database servers with 12-16GB RAM are common place, 32-64GB are available, 128/256 are conceivable), and solid state drives already on the market and getting bigger and faster, won’t the need for items #2 and #3 become irrelevant?

Most large DB server setups I know of already have 16GB mirrored memory (32 / 2) and load up the entire DB into RAM and cache all the queries execution plans, etc, etc, etc.  The database is essentially an IN MEMORY database.  It’s conceivable that in is a few as 5 years, solid state drives will be the norm, or at least readily available such that the storage and retrieval from a disk medium is not required or will change significantly.

Given these facts (and I hope you agree these are facts), do we really need to keep architecting our systems with a heavy bias on a pre-OO (1969) relational model designed for efficient structure of data for storage and retrieval using an equally archaic query language (SQL) to access it?

Don’t get me wrong, I’m not advocating we throw out our databases today and stop writing SQL. Quite the contrary; ADO.NET will still have a long and fruitful life.  No, what I propose is that we design our systems such that we do not limit our design to mere relational model persistence concerns. We design our software to take best advantage of what the application architecture/framework affords us (i.e. fundamental OO concepts) and then figure out a way, in the mean-time, to map our OO design to our relational model which is also carefully crafted and managed properly.

To keep with Ted’s motif:  If ORM is Vietnam, then we must keep an eye on the real goal: The fall of the Soviet Union (RDBMS for managing slow disks).  ORM is a short-lived, necessary, but bloody battle. Soon when slow-spinning disk storage is no longer a concern, issues #2 and #3 will be removed.  DBAs will be free to worry about more high level concerns about the best way to arrange queries and how to ensure that transactions are getting used correctly, etc rather than having to worry about where clustered indexes go or how long that NVARCHAR field should be, etc.  We’ll still need those DBA’s, but we’ll need their higher order functions, and not waste their time by having them do menial tasks like log file rollover schedules.

Please do not allow your application architecture to be dominated by RDBMS concerns. That isn’t to say that you shouldn’t be concerned with your RDBMS, I’m saying that your application code that determines whether a given user gets a 5% or 10% discount should have nothing to do with the database at all (and shouldn’t need things like a .Load() call to the DB or to execute a SQL stored procedure — or at least these things should be abstracted away from the core biz logic to the maximum extent possible).

My concern with the mindset to which tools like EF v1 are appealing is that you will not (easily) be able to accomplish this level of separation of concerns.  Database/persistence concerns will naturally bleed into every part of your application.  With the promised features of EF v2, you will be more able to accomplish the appropriate level of separation of concerns.  Thus, you will be able to quickly adapt to any new breakthroughs that are (hopefully) just around the corner in the Database world when Solid State drives or in-memory databases become the norm.

Related Articles:

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Chad Myers

Chad Myers is the Director of Development for Dovetail Software, in Austin, TX, where he leads a premiere software team building complex enterprise software products. Chad is a .NET software developer specializing in enterprise software designs and architectures. He has over 12 years of software development experience and a proven track record of Agile, test-driven project leadership using both Microsoft and open source tools. He is a community leader who speaks at the Austin .NET User's Group, the ADNUG Code Camp, and participates in various development communities and open source projects.
This entry was posted in Database, ORM. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://www.e-Crescendo.com jdn

    For the sake of discussion, I’ll accept that the ‘facts’ you list come to fruition.

    What do feel about the adoption rate? As you know, it can be a hard enough battle to get an ORM implemented in many shops.

    RDBMSs at this point are like a%%%%les, and opinions, everyone has got one. The mindshift away from them would, I think, be at least an order of magnitude greater than the mindset in some shops of allowing ORM.

    If your ‘facts’ turn out to be true, then this potential problem is obviously that transcends .NET.

  • http://www.ayende.com/Blog/ Ayende Rahien

    I think that are significantly understating the importance of 4. Fast and _complex_ queries.
    Relational isn’t about just getting the data to HD, it is about being able to query on it.

  • http://chadmyers.lostechies.com Chad Myers

    @jdn:
    I think most of the DBMS’s are probably already in prime shape to move this route. I think most DBMS’s are optimized to grab all the memory and work primarily out of memory and flush stuff to disk as necessary. Going diskless or changing how disks are treated doesn’t seem like that great of a logical leap for them.

    I’m thinking that the changes will happen in an evolutionary manner. Things like doing ORM from the CLR in the DBMS will start to make more sense. Eventually the R part will get pushed down lower into the bowels of the DB and it will be akin to maybe how we view things like ‘the stack’ and ‘the large object heap’, etc.

    It’s quite possible that the DBMS itself can automatically create on-the-fly/in-memory the relational model based form the object model and adapt to deployments of updated domain .NET assemblies/code so that we never actually have to see things like SQL or DDL anymore. I’m sure they’ll still be there for the REALLY edge case things (just like we can have the unsafe() code blocks in C# or even drop down to IL generation if we have to — when’s the last time any of us had to do that? — not including you, Ayende :) ).

    Eventually, the Relational model will not be in our pysche any more and then people will rediscover things like DDD and suddenly they will make sense in a whole new light because people will finally have unwrapped their total design concepts from relational design concepts and will be able to separate them and see DDD as really good for the object side of things and RDM for the relational side of things.

  • http://chadmyers.lostechies.com Chad Myers

    @Ayende Relational is one view of data. It has become the primary view of data because it happens to be the most efficient form for storing and retrieving data. It falls over in several other respects but up until recently, disk storage concerns trumped everything else because disks are the weakest link in terms of performance.

    Now that the DBMS’s have essentially removed that part of the equation by managing everything in memory, representing data in various views or forms to allow different types of querying becomes possible.

    When you have 16GB of available RAM and a 2GB database, you can do lots of interesting things.

  • http://schambers.lostechies.com Sean Chambers

    I think this is one area where DDD really motivates us to apply that seperation between storage and domain logic and forces us to defer the schema concerns until later in the development lifecycle. DDD newcomer developers look at how DDD approaches database concerns and are taken back that the database doesn’t influence application architecture. They are always very interested to see how it is accomplished.

    Once manufacturers being creating solid state drives that are targeted towards the server/san market and become more available, I think we will see these kind of topics coming up more often.

  • http://www.e-Crescendo.com jdn

    @Chad

    What is your take on Greg Young’s reminder that DDD is really hard, and not applicable to all applications (I think his reminder comes straight from Evans)?

    How does that play into your view? Do you think he is wrong?

  • http://chadmyers.lostechies.com Chad Myers

    @jdn: No. Certainly in your case where, if I recall correctly, you do a lot of ETL type stuff.

    I think mostly about line-of-business applications and stuff that involves a lot of users creating/retrieving/updating things and list of things with validation, etc. Looking back at the ones before my DDD days, I can’t think a single one that wouldn’t have benefited from DDD.

  • http://chadmyers.lostechies.com Chad Myers

    Clarification: Certainly NOT. My bad.

    But now that I think about it, it depends on the volume of data and type of processing. If we’re talking thousands of records with heavy processing going from one model to another, some DDD concepts would actually benefit you.

    If we’re talking millions of rows and little processing, then no DDD.

    Millions of rows and heavy processing? Maybe.

  • http://jimmybogard.lostechies.com Jimmy Bogard

    I’m still not sure that it’s the “in-memory” part that’s going to be the straw that broke the DB camel’s back. Also, what’s the alternative? SQL has been around a long time, and the database usually outlives the applications that were built on them.

    RDBMS do what they do very well. I don’t really see them going away for a long, long time.

  • http://neilmosafi.blogspot.com Neil Mosafi

    I do wholeheartedly agree that we should be starting with our object model before we even think about persistence. In fact, this is how I have always operated, probably because I started life doing game development and writing applications where there was no database, I would always just be building object models!
    Now that I am regularly writing line-of-business applications, I know that Relational Databases will never die. The fact that our support team can, using a simple tool, query the data into excel, pivot and analyse it, and compare it to the data in other source systems etc is so important.
    Couple that with the power of analysis services with ERL procedures Business Intelligence and reporting tools to do data mining.
    Building this on top of objects is pointless and not a worthy use of time IMHO, and the experts who use these tools on a daily basis will probably agree

  • http://www.scottcreynolds.com Scott

    @Chad

    I agree with you in principle, but in practical, the RDBMS is going to be around for a long time. Companies have made significant investements in hardware, software, and human resources for an RDBMS strategy, and they will be loathe to toss that aside.

    I’d be first in line to go another route for my dead object storage for sure, but as an industry we aren’t there yet. Given that, maybe ORM should start gearing up to be the Iraq of computer science. There’s no end in sight and no easy way out.

  • http://www.chrisholmesonline.com Chris Holmes

    Where do ODBM’s fit into this discussion? What do you think of an ODBMs Chad?

  • http://chadmyers.lostechies.com Chad Myers

    @Chris:

    Part of my point was that when everything is in memory, how the data is arranged internally is not my concern. When everything is in memory, you could even have several different views/presentations of the data (objects, JSON, Relational, XML etc).

    I guess I’m wishing that we could separate the means of querying/accessing the data (SQL + Relational) from the Storage (Relational).

    Right not this isn’t very easy because the RDBMS mindset is still heavily disk/file/storage-based thinking when is doesn’t really need to be any more.

  • jlockwood

    @Chris
    I’m currently working on a project where I have to map the same system to both RDBMs (Oracle, Derby) and ODBMs (Intersystem’s Cache). Hibernate is especiially useful to me in this case since I don’t have to concern myself with how the data is managed.

  • http://kevinhegg.blogspot.com Kevin Hegg

    Chad,

    I believe the DBMS market is going to change significantly in the coming years, but I believe some of your facts are incorrect and this is leading you to faulty conclusions.

    First, a large percentage of databases are growing at a rate of 10′s gigabytes – terabytes per day. Within a very short period of time it is no longer possible to cache everything in RAM. Placing large percentages of databases in RAM has been successfully done for years, but over the last few years the percentage that many organizations can put into RAM has dropped to a low percentage. So, while you conclude that it is more possible I conclude that it is less possible.

    Second, the relational model has been proven superior to network or hierarchical models for a couple of decades now. There have been numerous challenges over the years, but no sustainable superior alternatives have been developed. It is interesting that the relational alternatives attack on the small databases, but that is exactly the wrong place. The small database problem is no longer interesting to most vendors, database researchers, and organizations. Solving the large database problem is where the focus is now and anything under 100 GB is considered small. The biggest problem facing the relational model is the 10+ TB databases. Relational models do very well on the low end, but completely fall apart on the high end. It will be the inability of RDBMS’s to scale on the high end that will have more impact on the database market than non-relational or in-memory alternatives on the low end.

    Third, solid-state drives aren’t close to being ready for the mass market. Heavily used databases can wear out a solid-state drive in a couple of months. The organizations that need them the most can’t afford them due to their high cost and short life span. I’m sure this will get better over time, but solid-state is somewhat overhyped right now.

    What I struggle with and don’t have an answer for yet is what to do about the high end. Let’s assume we eliminate relational or come up with an O-O solution that everyone is happy with on the low end. It is not clear to me that O-O is going to do any better on the high end. I don’t know of anyone working on a high end database (250+ TB) that is considering an object model. Many of them have rejected a pure relational model, but still rely on some relational technology. So, if they aren’t going to use an object model then there will have to be some mapping.

  • http://chadmyers.lostechiesc.om Chad Myers

    @Kevin: Great comment, thanks! I’m sure that there are a good number of large databases, I still think the vast majority are under 30GB, in fact most are probably under 10GB (at least the OLTP portions, reporting/denormalized/warehouse/OLAP databases are a separate issue).

    You’re right about solid-state drives, but I’m pretty confident we’ll be seeing these problems solved in short order and solid state drives becoming very viable. But you’re right, as it currently stands, they’re not sufficient.

    I also agree that the relational model has proven itself time and time again for the best way to store and retrieve data that involves a slower permanent storage mechanism behind the database engine (i.e. disks, SAN, etc). My point is this: If everything is already in memory, is the relational model the best model for representing data in a high-speed storage mechanism (i.e. RAM)? Especially when you consider that our applications are object-oriented and we’re spending a lot of effort and pain dealing with Object/Relational mapping concerns?

    I’m not saying get rid of reltional, I’m saying, don’t use relational in an in-memory situation JUST BECAUSE that’s we’ve used in the past and that’s what we know. If, in fact, it turns out that the relational structure still proves itself in an in-memory-only situation, then great, we’ll keep using it. But we should all be aware the reasons WHY we’re using the relational model in those circumstances.

    I think object databases are getting there, but you’re right, I still don’t think they can compete in the large DB realm. But for databases that large, you’re probably not talking about a lot of OLTP stuff, you’re probably housing a lot of historical data for reporting or “you might also like these products”-type stuff, right? I don’t think that even EBay’s active auction table data pushes past a few dozen GB. It’s all the other historical data (user history, user ratings, etc, etc, etc).

    In an O-O world, it would probably make sense to only have the active domain objects in the ODB and push the historical data into a separate relational model for querying and large data storage using disks/SAN, etc.