Separation of Concerns and databases

I’m looking at a database table this morning, looking to optimize a few queries by adding some indexes. The trouble is, this table already has many indexes, all on different columns. So now I’m thinking, is this a smell that too many processes are concerned with this one table, all for different reasons?

Starting to think so, since those indexes are only used for one query or another. This looks like a situation to de-normalize and build report/view/query-specific tables to me.

Any other smells out there for violating SOLID principles at the database layer?

About Jimmy Bogard

I'm a technical architect with Headspring in Austin, TX. I focus on DDD, distributed systems, and any other acronym-centric design/architecture/methodology. I created AutoMapper and am a co-author of the ASP.NET MVC in Action books.
This entry was posted in Design, DistributedSystems. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • the biggest smell i usually see, other than this one, is too many nullable columns. sure you don’t always need to supply every field in a table. Address Line 2, for example, in an address table is perfectly fine as nullable. But when you see more than a handful and it’s not something obvious like an Address, start questioning. when you see dozens and you realize that certain fields are only populated if other fields have certain values… yeah, that’s all kinds of stank.

    • Anonymous

      I’m with you, but I’d take it a step farther. This is going to be controversial, but other than foreign key fields with an optional relationship you really should consider whether null is really appropriate. And if you think it is, you should reconsider again. The alternative being to have non-nullable columns with a DEFAULT constraint so that you don’t have to include every column in your insert. I’d say that except in a few specific instances, a database field that allows NULLs is a code smell.

      If you want to argue that you should use a NULL to indicate a value is truly unknown, that’s fine. But in your example of AddressLine2, that value isn’t really unknown. It’s known – there’s isn’t a second address line – it’s empty and that’s okay. And you can bet there’s someone writing a query who’s going to concat it with AddressLine1 without an ISNULL() and taint their result.

      • Joshua Sigar

        So, what are you saying about AddressLine2? Insert empty string (instead of NULL) or create a separate table?

        • Anonymous

          I’m saying in most cases your should disallow NULLs in fields like AddressLine2, and then set a default constraint to the field to assign empty string. That way, AddressLine2 doesn’t have to be specified in INSERT statements so you still get that advantage, but you no longer have to account for nulls all over the place. While NULLs are absolutely necessary in certain situations (like optional foreign keys), you have to weigh the decision against other needs such as reporting, ad-hoc queries, and the like. If your only access point to that database is an ORM then do whatever, but experience has shown me that a null in a field is almost always going to trip someone up.

          • Joshua Sigar

            Makes sense. Thanks.

  • I don’t want to trigger a debate between centralizing your queries (ala a QueryService or Repo) and spreading them out, but having all of your queries against an entity in one place certainly makes it easier to keep an eye on what queries your indexes are serving.

  • This approaches the fine line between SOLID/OOP and relational data modeling/physical performance. Each camp will identify certain “smells” and in many cases they will conflict with each other.

    From the data model, it can be argued that many of the normal forms can be derived from following SRP and DRY. However, some of the other SOLID principles are difficult to apply in this context.

    The trick is to find out the degree that normalization should be applied so that the physical access also performs well. A highly normalized model may actually suffer some performance risks (depending on the situation) since there can be a huge processing overhead on joins.

    With respect to indexes, consider that they are essentially copies of your table columns ordered for a particular access. If a query can find all of the data it needs from your indexes, it doesn’t even have to touch the table. (I’m thinking this is like the database fulfilling the LSP on your behalf).

  • Anonymous

    Off the top of my head, here are a few:

    Composite natural primary keys are a violation of DRY as they bloat other tables that need a foreign key reference. A surrogate primary key is preferable with a unique constraint to enforce the composite natural key rule.

    Tables and Views should not be named with a naming convention that distinguishes them from one another. That somewhat defeats the purpose. Think of programming to an interface… at some point you might want to swap out a table for a view or vice-versa and in order to not break some existing code you could wind up naming a view tbl_Xyz or a table Vw_Abc because of that useless naming convention. This is a loose DB version of the LSP.

    Sometimes it’s worth it to consider splitting a table into multiple parts with a one-to-one relationship. Just because the data is at the same grain, doesn’t mean it always has to be tied in the same table. I see this as an application of the SRP.