Avoid many-to-many mappings in ORMs

Going through and reviewing the Contoso University codebase, really to get caught up on EF 6 features, I found a relationship between two tables that resulted in a many-to-many mapping. We have these tables:

image

A Course can have many Instructors, and a Person (Instructor) can have many Courses. The EF code-first mapping for this relationship looks like:

modelBuilder.Entity<Course>()
    .HasMany(c => c.Instructors).WithMany(i => i.Courses)
    .Map(t => t.MapLeftKey("CourseID")
        .MapRightKey("InstructorID")
        .ToTable("CourseInstructor"));

The NHibernate mapping would look similar, with a .ManyToMany() mapping on one or both sides of the relationship. From the Course and Person entities, I treat the one-to-many direction as a normal collection:

public class Course
{
    /* Blah blah blah */
    public virtual ICollection<Instructor> Instructors { get; set; }
}

public class Instructor : Person
{
    /* Blah blah blah */
    public virtual ICollection<Course> Courses { get; set; }
}

From each direction, we don’t ever interact with the junction table, we just follow a relationship from each direction as if it were a one-to-many. There are a few reasons why I don’t like this sort of modeling. Many-to-many relationships are normal in databases, but in my entity model I don’t like treating these relationships as if the junction table doesn’t exist. Some of the reasons include:

  • No place to put behavior/data concerning the relationship. There is no CourseInstructor class to add properties to
  • In order to navigate a direction, I have to query through the originating entity, instead of starting with the junction table
  • It’s not obvious as a developer that the many-to-many relationship exists – I have to look and compare both sides to understand the relationship
  • The queries that result in this model often don’t line up to the SQL I would have written myself

For these reasons, I instead always start with my junction tables modeled explicitly:

public class CourseInstructor
{
    public virtual Course Course { get; set; }
    public virtual Instructor Instructor { get; set; }
}

From each side of the relationship, I can decide (or not) to model each direction of this relationship:

public class Course
{
    /* Blah blah blah */
    public virtual ICollection<CourseInstructor> Instructors { get; set; }
}
 
public class Instructor : Person
{
    /* Blah blah blah */
    public virtual ICollection<CourseInstructor> Courses { get; set; }
}

Many times, I’ll even avoid creating the collection properties on my entities, to force myself to decide whether or not I’m constraining my selection or if I really need to grab the entities on the other side. I can now build queries like this:

courses = db.CourseInstructors
    .Where(ci => ci.InstructorID == id)
    .Select(ci => ci.Course)

I can skip going through other side of the many-to-many relationship altogether, and start straight from the junction table and go from there. It’s obvious to the developer, and often times the ORM itself has an easier time constructing sensible queries.

I do lose a bit of convenience around pretending the junction table doesn’t exist from the Course and Instructor entities, but I’m happy with the tradeoff of a little convenience for greater flexibility and explicitness.

About Jimmy Bogard

I'm a technical architect with Headspring in Austin, TX. I focus on DDD, distributed systems, and any other acronym-centric design/architecture/methodology. I created AutoMapper and am a co-author of the ASP.NET MVC in Action books.
This entry was posted in EntityFramework, NHibernate. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • khalidabuhakmeh

    No place to put behavior/data concerning the relationship. There is no CourseInstructor class to add properties to
    That’s a really good point. Since I’ve been working with RavenDB so much I realize that sometimes the relationship isn’t really that, but a concept within my domain that needs to be expressed like you have in your code above.

  • “No place to put behavior/data concerning the relationship. There is no CourseInstructor class to add properties to”. Once that happens, you could think of it as not being a strict many-to-many anymore, and more of a one-to-many-to-many-to-one and model it that way too.

  • Rémi BOURGAREL

    +1000. A few weeks ago I spent 2-3 days for figuring out how entity framework managed those and how to deal with them when you have detached entities, It’s hell on earth, I’ll never do it again. You missed one problem : the data model is not the same as the class model (maybe your point n°3).

  • Mike Cole

    “A junction entity is a normal entity waiting to happen.” – Stolen and paraphrased from somewhere, and found to be true more often than not in my experience.

  • detroitpro

    This concept feels very “Graph databse-ish”. It’s something I use and one of “features” that has allowed me to develop parts of a system that would otherwise be best suited for a GDB while I’m stuck in the constraints of a RDB.

  • I generally consider many to many relationships as an indicator that I might be crossing bounded contexts. It’s very rare that you’ll use both sides of the relationship at the same time

  • Yep, I agree with you. It is very common to have a need to add extra properties to the junction table and it’s hard to change existing code.

    • Matt Sugden

      Doesn’t adding extra columns to a junction table make it a regular transaction table?

      • Yep

        • Matt Sugden

          Which alters the behavior of the class that maps to it and makes it more that what it’s supposed to be. Just my opinion.

  • Definitely run into this before. As mentioned, changing code once you realize there’s a real thing in that junction table is …. fun.

  • Farhad

    Hi

    Hope all is well.

    Is it possible for you to post the mapping on this?

    • jbogard

      Hmmm, I don’t think I have it any more. It should be relatively straightforward though – HasMany for each of the two main entities towards the join entity, then a HasRequired on the join entity’s mapping.

  • Michael Powell

    It’s the classical “impedance mismatch”, or perceived one, at any rate. As a first class domain object, one can then consider topics like, adding date ranges; i.e. instructor Allen teaches course Intro for the fall semester, whereas instructor Bob teaches course Intro for the spring semester.

  • jw

    Why wouldn’t you explicitly define the CourseId and InstructorId in CourseInstructor? I know EF will generate it automagically in the migrations, but wouldn’t that make querying a pain? Without it the query might look like:

    courses =
    db.CourseInstructors
    .Where(ci => ci.Instructor.ID == id)
    .Select(ci => ci.Course)

    • jbogard

      I would now, you’re right it’s much easier that way.

  • Umair Ahmed

    Hello Jimmy,

    ” … I have to query through the originating entity, instead of starting with the junction table”

    Does this mean now the junction table will be the aggregate root? For instance creating a new entry, the junction table will enforce the validation rules like a person is eligible to teach a course?

  • Michael Plautz

    Fundamental disagreement. You have brought up the idea of storing specific behavior/data concerning a relationship between two tables (entities). However, especially if you are talking about using an ORM library like *Hibernate, the problem it is solving is object persistence – therefore objects are the first-class citizens, not database tables. If the objects are modeled properly, they likely do not have specific data about their many-to-many relationships. It is as simple as a Teacher has many courses, and a course has many teachers. If specific data regarding the relationship between a teacher and a course were required, the Object Model should encompass that (perhaps with a Relation object), and the ORM and database should follow suit.

    • jbogard

      In my experience those relationships gather data and behavior over time, so better to be explicit up front.

    • Macchendra

      Speaking of it strictly in terms of objects, what has a collection of what? By not modelling the specific relationship, you aren’t sparing the objects from having to bow to the whims of the database, you are denying the reality of the data from being materialized within them. Where is the concept that relates them? Oh, it is hidden in the database. Also, the intermediate table is friendlier to business logic abstractions, IMO.