SQL is the assembly language of the modern world

This isn’t a new idea, it’s been mentioned before, so I’m not taking credit for it, but it’s still worth pointing out… again… for the umpteenth time. So I’m gonna pick up the baton-of-flames and try the argument again:

I really would like to challenge the conventional wisdom that stored procedures are the best and/or that hand-crafted SQL is the only way to achieve good performance.

I once heard someone say that “SQL is the assembly language of the business application world” and I totally agree with this (I linked to Jett Attwood’s post above, but I heard it somewhere else first and my Google-fu is failing me tonight).

If you read the discussions on USENET back in the day when C++ and later Java were coming into popularity, there were a bunch of die-hards who held the conventional wisdom that C++ and Java compilers couldn’t possibly ever generate machine code that would achieve the super high performance of lovingly hand-crafted assembly language.  Add into that the bytecode interpreting of Java and the Garbage Collector and their heads nearly exploded.

Now, the same kinds are arguments are being made about not using stored procedures and instead using generators or even the inconceivable: Object/Relational Mappers.

While the arguments of the assembly-language advocates were mostly true: Compilers can’t create the mostly highly optimized code, it turns out that modern runtime environments, processors, and operating systems can account for this and actually do a better job of managing the complexities of runtime performance optimizations better than a human assembly language writer could ever hope to achieve.  The system now tunes itself on the fly based on the current reality of the operating environment as it stands right now. 

SQL Server is now getting to the point where it’s able to better tune itself based on the kinds of queries coming in and the volume and type of data it’s returning. It’s getting to the point where hand-crafting SQL may actually be self-defeating. We’re not quite there yet, and there are still scenarios where hand-crafting SQL makes sense. Coincidentally, there are still situations where, in C#, it’s appropriate to use the unsafe() keyword, or even use unmanaged code or even assembly language!

I don’t think we’ll ever be able to get away from hand-crafted SQL, but I think it should be our goal to standardized on a set and sytle of SQL generation and let the tools like Linq2SQL, NHibernate, etc work with tools like SQL Server and let them work together to optimize your intentions.

Our goal should be to get out of the SQL crafting business and get back into the data access business.

Related Articles:

    Post Footer automatically generated by Add Post Footer Plugin for wordpress.

    About Chad Myers

    Chad Myers is the Director of Development for Dovetail Software, in Austin, TX, where he leads a premiere software team building complex enterprise software products. Chad is a .NET software developer specializing in enterprise software designs and architectures. He has over 12 years of software development experience and a proven track record of Agile, test-driven project leadership using both Microsoft and open source tools. He is a community leader who speaks at the Austin .NET User's Group, the ADNUG Code Camp, and participates in various development communities and open source projects.
    This entry was posted in Misc, SQL is evil. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
    • http://haacked.com/ Haacked

      I totally agree, but I sometimes wonder if my view of this is inconsistent with the idea that HTML is perhaps the assembly language of the web. Why not just use these big server controls that spit out HTML underneath? Why do we need more control over our markup.

      In order not to explode in a paradox, I think it has to do with the fact that HTML is about presentation, and people are real particular about how things look. But I’m less particular about shaving 10 ms off of data access (if that were even true) if it costs me 100 more hours of work.

    • http://schambers.lostechies.com Sean Chambers

      Interesting points.

      I am still holding my breath for a widely accepted object oriented database. There are some around, probably the most popular being Intersystems Cache, but they are rarely used and even more rarely on the rise.

      There are circumstances of course that will always require relational data structures but I feel we are starting to move away from this in DDD type contexts. The technology is still far off but I think the OODB will make a come back soon.

      Or perhaps I should just continue to hold my breath =)

    • http://www.opgenorth.net Tom Opgenorth

      I too agree, and eagerly await our new OR/M overlords.

      To quote Oren Eini: “I learned SQL for self defence”.

      :)

    • Eric Hexter

      LoL. oh this is so true.

    • http://jimmybogard.lostechies.com Jimmy Bogard

      But it sure was fun creating a stack in a temporary table to do recursive joins in SQL 2000.

      Oh wait, no it wasn’t.

    • Cpt

      You forget one additional advantage of using ORM: they completely move you away from the complexities of saving complex (yes… I know twice the same word makes the sentence more complex) configurations of objects in a good way. Think of how you would handcraft SQL to save (delete or retrieve) an object which contains a collection of other objects which contain some other objects all of which may or may not be related to a table (or saved in the same table). Let alone that ORM tools like Hibernate (in which ever shape or form used) hide the complexity or even the make of the RDBMS at the back, thus giving you an architectural advantage in the way of flexibility and scalability. So in that way SQL could indeed be looked at as assembly and Hibernate (or any other ORM) as the compiler.

    • http://chadmyers.lostechies.com Chad Myers

      @Phil

      Yeah, HTML is different, I agree. But it sure FEELs like assembly language :)

      @Sean:

      I don’t think that the answer is OODB (if it were, we probably would have something ‘pretty darn close’ by now).

      It turns out that database engines are REALLY good at loading and storing data fast and reliably. But somehow in order to load and save data, we have to have all this archaic schema and back-asswards query language stuff.

      I wish that the DB vendors would separate the schema aspects and let that be more flexible and stick to the data storage and retrieval part.

      I have another post on that in the works, but it’s much more involved so it’ll take awhile to finish :(

    • vlado

      Well, it could feel like assembly language – you describe ‘low level’ details about your data and how to access and manipulate it, but, unfortunately sql is not flexible enough as assembly.

      It explodes when you add support for your data models evolution, semi-structured or strangely structured data. In the end you fit your world to sql not sql to the world. Hence all the monstrosities like OR mapping etc…

    • tom

      I think the comparison is a bit lame. In assembler there was a lot of bookkeeping for the simple stuff like a loop. You had to know what register did what, and worse: a limited set of registers.

      SQL ist an ugly (and imperfect) but rather thin wrapper around relational calculus. There is a lot of SQL-fu which gets ugly when expressed in e.g. the excellent sqlalchemy.

      I did not really understand if this post is pro or contra stored procedures? They are not only useful for speed but also for ACID and to keep business logic in a central place. (Business logic in clients is an idea as good as storing logic in js on webpages IMHO).

    • http://www.raum-fuer-notizen.de Lars Pohlmann

      @tom: full ACK to your thoughts about stored-procedures.
      That goes especially for environments, where you have different clients and architectures accessing data from and writing data to the database. You’ll need the logic at a central point to avoid code-duplication, and stored-procedures are the best way to do that, even if they are not that beautiful.

    • Gryph0n

      “it turns out that modern runtime environments, processors, and operating systems can account for this and actually do a better job of managing the complexities of runtime performance optimizations better than a human assembly language writer could ever hope to achieve”

      I disagree. I’m getting pedantic, but carefully handcrafted assembly code can beat all of the above.

      However the returns on investment are usually low. The fact is that computer time on modern systems have become very cheap, while programmer time hasnt become any cheaper. So for most applications, compilers and runtime systems are good enough and assembly is overkill.

    • she

      Who writes html still?
      I write in a pseudo-DSL that combines CSS and HTML (but I must admit, for javascript i dont really have code generators… but I would also like to be able to choose another language than Javascript, one so called “scripting language”)

      I never understood people who try to squeeze out the last bit of performance. Its as if their time is not very precious, or maybe they need to sell something that is super fast, but for me I just want to be able to do things that I like

    • http://grabbagoft.blogspot.com Jimmy Bogard

      @tom, @Lars

      > They are not only useful for speed but also for ACID and to keep business logic in a central place.

      Stored Procs are NOT NOT NOT NOT the place for ANY business logic. Period.

      Domain is where the heart of the business logic begins. It’s the most maintainable, most expressive way to do so. SProcs are _a_ way to centralize your domain behavior, but besides spaghetti-code, big-ball-of-mud, it’s about the worst way I can think of.

    • Miklos Hollender

      Not yet, not yet. I’m working with MS Dynamics-NAV which generates SQL. And I often wish it didn’t.

      See, SQL performance isn’t the same as desktop performance. On the desktop, if it’s slow… well they’ll just a wait a bit But on the SQL if a complicated write transaction that puts data into 8-10 tables (ship & invoice an order f.e.) takes too long, it will keep those tables locked, so another user cannot post another order, basically if you have 10 people posting orders like crazy they will keep locking each other, sometimes in a deadlock etc. It’s a huge pain. It’s not just performance, it’s the difference between something they can work with or not.

      So, it’s too early to phase out SQL.

    • test

      I want to know how do you handle aggregates in your ORMs?

      I agree that writting in this age sql for updating single records is crap, but show me how do you pull reports in ORM world.

    • http://jimmybogard.lostechies.com Jimmy Bogard

      @test

      DTOs and projections. I create simple objects to hold my reports, and a simple projection or mapping to pull the information out. I would call it “aggregations” and not “aggregates”, as aggregate has a different meaning in the DDD world.

    • tom

      @bogardj: “business logic” is a vast area, so it is possible we are not talking about the same thing. I mean the specific area of checks and invariants, i.e. rather low level stuff.

      We usually have a second layer of “business logic” on top of the db to keep an uniform interface, handle schema migrations gracefully etc.

      I learned (the hard way) that stored procs are a maintenance problem (nightmare?), so I use them sparingly, but I still believe that they the right thing for core-logic.

    • http://blodgettm.blogspot.com Matt Blodgett

      Chad,

      You might find this post interesting: “Is SQL Manual Memory Management?”

      http://gilesbowkett.blogspot.com/2007/03/is-sql-manual-memory-management.html

    • http://jimmybogard.lostechies.com Jimmy Bogard

      @tom

      Maybe I did misunderstand. Do you have an example of checks and invariants that you would put in your DB?

      Right now the only logic I feel is appropriate in the database are:
      - nulls/not nulls
      - primary keys
      - foreign key constraints

      Because all of these are concerns of the data only. I wouldn’t even put logic to have correctly formed data, for example phone numbers must look like this “(123) 555-5555″. This is a domain concern, not a data concern.

    • http://chadmyers.lostechies.com Chad Myers

      @tom

      Worse yet, there’s no way to test stored procedures! So that’s about the LAST place I’d want biz logic. People put it there because they use the DB as an integration point for many apps because they’re used to the mainframe model of things. But mainframes ran code on top of a DB (though there was some fuzziness about where the DB ended and the code began).

      I’m all for a veneer of domain logic in code as an integration point squatting on a DB.

    • Thijs Blaauw

      Totally wrong comparison. SQL is not a low level language at all. SQL is a very high level Domain Specific Language, where the domain is data manipulation.

      Using a layer on op of SQL for CRUD operations is perfectly reasonable, OTOH in my applications I don’t tend to use CRUD operations very often. Most times I use larger transactions like CreateAccount, DeactiveADSLLine, RegisterMove. Using stored procedures for this kind of transactions works very well for me.

      BTW. I use Oracle. Oracle has packages and types (classes) to group related functions and procedures together. This makes it possible to refactor large procedures into small procedures, without cluttering up the global namespace. I understand that SQL Server has no such grouping mechanism.

    • http://www.raum-fuer-notizen.de Lars Pohlmann

      @Thijs: Yes, I guess Oracle stands out in this regard. We have a .net developer here, who only worked with sql-server before (I don’t have any experience with sql-server), and he *never* wants to go back!

    • tom

      @bogardj: A quick survey of my code shows some examples:
      * A customer rents and uses stage-equipment. Each item has a status like “in use”, “rented”, “in repair”, “defect or stolen”. Items could only be deleted when the status was “defect or stolen”.
      * We had a doctor with incoming pieces of skin-probes, where each probe got a label with a number on it. It is very important that there are no left-out numbers (1, 2, 3, 5 is bad) because she can not know if the probe was lost, forgotten or just a program-glitch. So we had a INSERT-trigger-proc check for that.

      There are more examples, but they are bad. That shows that you are right … I would not do it like that again..

    • Nick

      @bogartj

      Why even bother with the nulls/not nulls? That seems to me to be validation at the same level as, say, string length and I check that at state change. The only non-nulls I set these days are PKs and FKs.

    • http://chadmyers.lostechies.com Chad Myers

      @Everyone:

      My point with the ‘assembly language’ remark wasn’t as much to say it’s a ‘low level language’, as to say that it’s an ‘arcane language with tons of idiosyncrasies and lots of legacy cruft that makes approaching SQL as a newbie/student very difficult and riddled with complications’

    • http://jimmybogard.lostechies.com Jimmy Bogard

      @Thijs

      So two questions:

      - How do you unit test your transactions/packages/types
      - How do you refactor your transactions/packages/types

      If either of these are manual or not done, you have a legacy code system.

      I had a lot of these arguments with some Oracle DBAs at my last company. Eventually it came down to how easily the logic was to test. Since our domain behavior could be tested without ANY database up, our argument won out. 2000 unit tests against code executes in the same amount of time as 20 database tests. That means we could move much faster than the DBAs could.

      Eventually, we moved a DBA to our team (instead of in their own group, which created an us vs. them mentality). They were then very helpful in seeing our modeling approach and making appropriate optimizations, partitioning, etc.

    • http://www.pervasivecode.com/ Jamie Flournoy

      The problem I see with SQL currently is that dynamic languages no longer allow the assumption that all instances of a class will contain private state that has the same structure. For an app written in C, SQL is very high level and mapping structs to tables makes sense. For an app written in Python, Ruby, JS, etc. where one out of 100 instances might have some extra methods and properties, where do you put those if there’s no column in the DB? Hence the recent interest in document databases, which are more loosey-goosey about the column list / list of valid keys in each stored item.

      On the other hand, the argument that stored procs are bad, and the argument that SQL is bad because it’s hard to represent complex collections, are at odds with one another. Putting business logic in the DB is bad, yes. Going overboard with data rules can be problematic also. But let’s say you want to store and query a directed graph data structure. This is non trivial in standard SQL, but extending SQL via stored procs can make the querying code trivial.

      The argument boils down to “I refuse to write reusable code so my code looks like spaghetti, therefore the language sucks.” Keep the domain code out of your database, sure, but don’t be afraid to add generic functions / procs to the database.

    • Thijs Blaauw

      @bogardj

      Actually, the tool I use for programming PL/SQL, PL/SQL Developer by Allround Automations, has some assistance for simple automatic refactorings.
      No, we don’t do enough unit testing, so in your definition we have a legacy system.

      Maybe I work on simpler systems than you, but most of programs I make either store data in the database, or retrieve data from the database. So if I create unit tests that don’t touch the database, I test only 20% of the program.

      @Lars Pohlmann

      We have a developer here who comes from a Sybase background. Sybase uses the same T-SQL language as SQL Server. We had a bit of a hard time stopping him writing 5000 line procedures that did lots of different things based on flag parameters. Programming that way was of course a way to avoid the namespace problem. This might also explain that developers who develop against SQL Server or Sybase, are more violently opposed to using stored procedures than developers who use Oracle.

    • http://grabbagoft.blogspot.com Jimmy Bogard

      @Thijs

      To be fair, it’s not my definition of legacy code, it’s MIchael Feathers’.

      For some size perspective: I’m on a pretty small project right now, ~3 weeks long. It has 2000 LoC, roughly half in tests and half in the application.

      Although it has several types that are persisted in the database, there are 0 (zero) lines of database access code. In fact, no project in this application references System.Data, the .NET data access assembly.

      So even though my application relies on the database heavily for persistence, it doesn’t have a single line of SQL! All of our “data access code” is in NHibernate mapping files. The database is just a means of persistence for us, the heart of the application lies in the domain.

      Our client assumes that the data is persisted in the database, but beyond that, they could care less. They care about the behavior of the system, not about the tables data is stored in.

    • http://johnzabroski@yahoo.com John "Z-Bo" Zabroski

      @tom
      @I learned (the hard way) that stored procs are a maintenance problem (nightmare?), so I use them sparingly, but I still believe that they the right thing for core-logic.

      What do you consider “core logic”? In other words, what do you use stored procedures sparingly for?

      I’m a little lost. Did bogardj just get you to convert your faith and regret your “stored procedures sparingly, for core logic only” position?

    • http://johnzabroski@yahoo.com John "Z-Bo" Zabroski

      @bogardj
      @ – How do you unit test your transactions/packages/types – How do you refactor your transactions/packages/types If either of these are manual or not done, you have a legacy code system.

      That is just one person’s definition (Michael Feathers). That is a lot like people who read Eric Evans’s Domain-Driven Design book and think his vocabulary for DDD is the only vocabulary that works. In a sense, his vocabulary is just a meta-model framework.

      In a nutshell, your basic idea is CYA: “Tell me what you do. Show me where it says that. Prove that that is what happened.”

    • http://chadmyers.lostechies.com Chad Myers

      @John: “Prove that that is what happened”

      And is that not an entirely appropriate stance for an engineer to take w/r/t to his own designs and creations? Is this not the basis for science?

      IMHO, if you can’t prove your code works repeatedly without variance, then it doesn’t work or is ‘legacy code’ (i.e. an unknown variable).

      Since testing database code is darned near impossible, or at least extremely annoying, difficult, time consuming, and expensive, it is extremely risky to put anything other than data and schema in the DB.

    • http://johnzabroski@yahoo.com John "Z-Bo" Zabroski

      @Chad
      It is appropriate.

      I just thought it nice to use a clear phrase that every engineer, not just those in the agile/unit test crowd, can understand. Most people, even most programmers, don’t know that definition of legacy code. Everyone understands: “Tell me what you do. Show me where it says that. Prove that that is what happened.”

      Also, unit tests are just one way to accomplish this.

    • http://rashkovskii.com/ Yurii Rashkovskii

      I believe that in future there is a high chance that database itself will finally become a programming language as well. It isn’t something new actually. GemStone is a database and programming language.

      Though I think that there is still room for such experiments and some projects could eventually become widely used (hopefully mine ;) — StrokeDB and Xi [which is on hold now])