Some improved LINQ operators


I ran across a couple of scenarios the other day that were made pretty difficult given the current LINQ query operators.  First, I needed to see if an item existed in a collection.  That’s easy with the Contains method, when you want to find item that matches all the attributes you’re looking for.

Suppose I want only one attribute to match?  For example, I have a Person class:

public class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

What if I want to see if a collection of Persons contains someone with the last name “Smith”?  Contains only gives me two options:

public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value);
public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value, IEqualityComparer<TSource> comparer);

That doesn’t help, I have to implement some interface just to match against the LastName.  Typically, this is solved with one of two options:

// Inefficient Contains replacement
values
    .Where(person => person.LastName == "Smith")
    .Count()
    .ShouldBeGreaterThan(0);

// Efficient, but ugly and hard to use
values
    .Where(person => person.LastName == "Smith")
    .FirstOrDefault()
    .ShouldNotBeNull();

The first example is inefficient because Count() iterates through all of the values found, where I only really care if one is found.  The second example works, but loses the intent of what I’m trying to find out.

I also had the same types of problems with Distinct, where I’d like to find distinct elements, but only looking at a certain value.  I had to implement the same IEqualityComparer (very annoying).

Better LINQ extensions

Instead of implementing some crazy interface, I’d like to just give the Contains and Distinct query operators an expression of what to look for.  I’d like this test to pass:

[Test]
 public void Better_enumerable_extensions()
 {
     var values = new[]
                      {
                          new Person {FirstName = "Bob", LastName = "Smith"},
                          new Person {FirstName = "Don", LastName = "Allen"},
                          new Person {FirstName = "Bob", LastName = "Sacamano"},
                          new Person {FirstName = "Chris", LastName = "Smith"},
                          new Person {FirstName = "George", LastName = "Allen"}
                      };

     values
         .Distinct(person => person.LastName)
         .Count()
         .ShouldEqual(3);

     values
         .Distinct(person => person.FirstName)
         .Count()
         .ShouldEqual(4);

     values
         .Contains("Smith", person => person.LastName)
         .ShouldBeTrue();

     values
         .Contains("Nixon", person => person.LastName)
         .ShouldBeFalse();
}

In the Distinct example, I pass in a lambda expression of the distinct attribute I’m looking for.  In the Contains example, I pass in the lambda expression, as well as the value I’m looking for.

To do this, I’ll need to create my extensions class with the new extension methods:

public static class BetterEnumerableExtensions
{
    public static IEnumerable<TSource> Distinct<TSource, TResult>(
        this IEnumerable<TSource> source, Func<TSource, TResult> comparer)
    {
        return source.Distinct(new DynamicComparer<TSource, TResult>(comparer));
    }

    public static bool Contains<TSource, TResult>(
        this IEnumerable<TSource> source, TResult value, Func<TSource, TResult> selector)
    {
        foreach (TSource sourceItem in source)
        {
            TResult sourceValue = selector(sourceItem);
            if (sourceValue.Equals(value))
                return true;
        }
        return false;
    }
}

Yeah yeah, all those angle-brackets really start to get ugly.  The new Contains method takes in the selector method now, in the form of a Func delegate.  In the body, I just loop through the source items, evaluating the selector for each item.  If the source value matches the value I’m searching for, I return “true” immediately and stop looping.  Otherwise, I return false.

The new Distinct method uses the existing Distinct, but now it’s using a new DynamicComparer class:

public class DynamicComparer<T, TResult> : IEqualityComparer<T>
{
    private readonly Func<T, TResult> _selector;

    public DynamicComparer(Func<T, TResult> selector)
    {
        _selector = selector;
    }

    public bool Equals(T x, T y)
    {
        TResult result1 = _selector(x);
        TResult result2 = _selector(y);
        return result1.Equals(result2);
    }

    public int GetHashCode(T obj)
    {
        TResult result = _selector(obj);
        return result.GetHashCode();
    }
}

It has to do similar things as the Contains method, where I evaluate the items passed in against the selector method delegate passed in earlier.  In any case, the existing Distinct method works the way I want to, without me needing to re-implement its internal logic as I did with the Contains.

I tried using the DynamicComparer with the Contains method, but it just worked out better re-implementing the logic.

Intention-revealing interfaces == good

The Where/Count or even the Where/FirstOrDefault ways of getting the Contains is just plain ugly.  By passing in a selector method, I can describe exactly what I’m looking for.  In the case of Distinct, having to create a custom IEqualityComparer just for that is unnecessary most of the time.  When I saw that initially, it just looked like more trouble than it was worth.  But with the new and improved extensions, I get a much cleaner implementation.

Forbidden Void type in C#