Hacking LINQ Expressions: Select With Index

15 September, 2009. It was a Tuesday.

First, a point of clarification: I use LINQ Expressions to mean

(Language-INtegrated) Query Expressions (the language feature) rather

than Expression Trees (the .NET 3.5 library in System.Linq.Expressions).

So what do I mean by “Hacking LINQ Expressions”? Quite simply, I’m

not content with the rather limited set of operations that query

expressions allow me to represent. By understanding how queries are

translated, we can use various techniques to broaden our expressive

reach. I have already documented one such hack for managing IDisposable objects with LINQ, so I guess we can call this the second in an unbounded series.

The Problem

In thinking over use cases for functional construction of web control trees, I paused to think through how I would express alternate row styling. My mind immediately jumped to the overload of Select() that exposes the current element’s index:

Controls.Add(
    new Table().WithControls(
        data.Select((x, i) =>
            new TableRow() {
                CssClass = i % 2 == 0 ? "" : "alt"
            }.WithControls(
                new TableCell().WithControls(x)
            )
        )
    )
);

This works fine for simple cases, but breaks down for more complex queries:

Controls.Add(
    new Table().WithControls((
        from x in Xs
        join y in Ys on x.Key equals y.Key
        select new { x, y }
        ).Select((z, i) =>
            new TableRow() {
                CssClass = i % 2 == 0 ? "" : "alt"
            }.WithControls(
                new TableCell().WithControls(z.x.ValueX, z.y.ValueY)
            )
        )
    )
);

The Goal

Instead, I propose a simple extension method to retrieve an index at arbitrary points in a query:

var res = from x in data
          from i in x.GetIndex()
          select new { x, i };

Or our control examples:

Controls.Add(
    new Table().WithControls(
        from x in data
        from i in x.GetIndex()
        select new TableRow() {
            CssClass = i % 2 == 0 ? "" : "alt"
        }.WithControls(
            new TableCell().WithControls(x)
        )
    )
);

Controls.Add(
    new Table().WithControls(
        from x in Xs
        join y in Ys on x.Key equals y.Key
        from i in y.GetIndex()
        select new TableRow() {
            CssClass = i % 2 == 0 ? "" : "alt"
        }.WithControls(
            new TableCell().WithControls(x.ValueX, y.ValueY)
        )
    )
);

Much like in the IDisposable solution, we use a from clause to act as an intermediate assignment. But in this case our hack is a bit trickier than a simple iterator.

The Hack

For this solution we’re going to take advantage of how multiple from clauses are translated:

var res = data.SelectMany(x => x.GetIndex(), (x, i) => new { x, i });

Looking at the parameter list, we see that our collectionSelector should return the result of x.GetIndex() and our resultSelector‘s second argument needs to be an int:

public static IEnumerable<TResult> SelectMany<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, SelectIndexProvider> collectionSelector,
    Func<TSource, int, TResult> resultSelector)

The astute observer will notice that the signature of this resultSelector exactly matches the selector used by Select‘s with-index overload, trivializing the method implementation:

{
    return source.Select(resultSelector);
}

Note that we’re not even using collectionSelector! We’re just using its return type as a flag to force the compiler to use this version of SelectMany(). The rest of the pieces are incredibly simple now that we know the actual SelectIndexProvider value is never used:

public sealed class SelectIndexProvider
{
    private SelectIndexProvider() { }
}

public static SelectIndexProvider GetIndex<T>(this T element)
{
    return null;
}

And for good measure, an equivalent version to extend IQueryable<>:

public static IQueryable<TResult> SelectMany<TSource, TResult>(
    this IQueryable<TSource> source,
    Expression<Func<TSource, SelectIndexProvider>> collectionSelector,
    Expression<Func<TSource, int, TResult>> resultSelector)
{
    return source.Select(resultSelector);
}

Because we’re just calling Select(), the query expression isn’t even aware of the call to GetIndex():

System.Linq.Enumerable+d__b1.Select((x, i) => (x * i))

We’re essentially providing our own syntactic sugar over the sugar already provided by query expressions. Pretty sweet, eh?

As a final exercise for the reader, what would this print?

var res = from x in Enumerable.Range(1, 5)
          from i in x.GetIndex()
          from y in Enumerable.Repeat(i, x)
          where y % 2 == 1
          from j in 0.GetIndex()
          select i+j;

foreach (var r in res)
    Console.WriteLine(r);

← Functional Construction for ASP.NET Web Forms

Hacking LINQ Expressions: Join With Comparer →