Hacking LINQ Expressions: Select With Index

First, a point of clarification: I use LINQ Expressions to mean
(Language-INtegrated) Query Expressions (the language feature) rather
than Expression Trees (the .NET 3.5 library in System.Linq.Expressions).

So what do I mean by “Hacking LINQ Expressions”? Quite simply, I’m
not content with the rather limited set of operations that query
expressions allow me to represent. By understanding how queries are
translated, we can use various techniques to broaden our expressive
reach. I have already documented one such hack for managing IDisposable objects with LINQ, so I guess we can call this the second in an unbounded series.

The Problem

In thinking over use cases for functional construction of web control trees, I paused to think through how I would express alternate row styling. My mind immediately jumped to the overload of Select() that exposes the current element’s index:

Controls.Add(
new Table().WithControls(
data.Select((x, i) =>
new TableRow() {
CssClass = i % 2 == 0 ? "" : "alt"
}.WithControls(
new TableCell().WithControls(x)
)
)
)
);

This works fine for simple cases, but breaks down for more complex queries:

Controls.Add(
new Table().WithControls((
from x in Xs
join y in Ys on x.Key equals y.Key
select new { x, y }
).Select((z, i) =>
new TableRow() {
CssClass = i % 2 == 0 ? "" : "alt"
}.WithControls(
new TableCell().WithControls(z.x.ValueX, z.y.ValueY)
)
)
)
);

The Goal

Instead, I propose a simple extension method to retrieve an index at arbitrary points in a query:

var res = from x in data
from i in x.GetIndex()
select new { x, i };

Or our control examples:

Controls.Add(
new Table().WithControls(
from x in data
from i in x.GetIndex()
select new TableRow() {
CssClass = i % 2 == 0 ? "" : "alt"
}.WithControls(
new TableCell().WithControls(x)
)
)
);

Controls.Add(
new Table().WithControls(
from x in Xs
join y in Ys on x.Key equals y.Key
from i in y.GetIndex()
select new TableRow() {
CssClass = i % 2 == 0 ? "" : "alt"
}.WithControls(
new TableCell().WithControls(x.ValueX, y.ValueY)
)
)
);

Much like in the IDisposable solution, we use a from clause to act as an intermediate assignment. But in this case our hack is a bit trickier than a simple iterator.

The Hack

For this solution we’re going to take advantage of how multiple from clauses are translated:

var res = data.SelectMany(x => x.GetIndex(), (x, i) => new { x, i });

Looking at the parameter list, we see that our collectionSelector should return the result of x.GetIndex() and our resultSelector‘s second argument needs to be an int:

public static IEnumerable<TResult> SelectMany<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, SelectIndexProvider> collectionSelector,
Func<TSource, int, TResult> resultSelector)

The astute observer will notice that the signature of this resultSelector exactly matches the selector used by Select‘s with-index overload, trivializing the method implementation:

{
return source.Select(resultSelector);
}

Note that we’re not even using collectionSelector! We’re just using its return type as a flag to force the compiler to use this version of SelectMany(). The rest of the pieces are incredibly simple now that we know the actual SelectIndexProvider value is never used:

public sealed class SelectIndexProvider
{
private SelectIndexProvider() { }
}

public static SelectIndexProvider GetIndex<T>(this T element)
{
return null;
}

And for good measure, an equivalent version to extend IQueryable<>:

public static IQueryable<TResult> SelectMany<TSource, TResult>(
this IQueryable<TSource> source,
Expression<Func<TSource, SelectIndexProvider>> collectionSelector,
Expression<Func<TSource, int, TResult>> resultSelector)
{
return source.Select(resultSelector);
}

Because we’re just calling Select(), the query expression isn’t even aware of the call to GetIndex():

System.Linq.Enumerable+<RangeIterator>d__b1.Select((x, i) => (x * i))

We’re essentially providing our own syntactic sugar over the sugar already provided by query expressions. Pretty sweet, eh?

As a final exercise for the reader, what would this print?

var res = from x in Enumerable.Range(1, 5)
from i in x.GetIndex()
from y in Enumerable.Repeat(i, x)
where y % 2 == 1
from j in 0.GetIndex()
select i+j;

foreach (var r in res)
Console.WriteLine(r);

Related Articles:

About Keith Dahlby

I'm a .NET developer, Git enthusiast and language geek from Cedar Rapids, IA. I work as a software guru at J&P Cycles and studied Human-Computer Interaction at Iowa State University.
This entry was posted in Hacking LINQ, LINQ. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://meandmycode.com Stephen

    As you say, its a hack- ultimately I’m not sure the linq syntax is always that great, and I often just write the extension methods directly instead.. I would think you could just have done:

    from x in seq.Select((y, i) => new { y, i })

    or write an extension method:

    from x in seq.WithIndexes()

  • http://www.lostechies.com/members/dahlbyk/default.aspx Keith Dahlby

    You certainly could inline the Select call within a query, however you’re then stuck using an anonymous type and an “extra” variable. You either end up with variables like x with properties like y that mean nothing, or code that looks like this:

    from ProductWithIndex in Products.Select((p,i) => new { Product = p, Index = i })
    where ProductWithIndex.Product.IsSpecial()
    select new { ProductWithIndex.Index, ProductWithIndex.Product.ID };

    By leaning on the query expression, that intermediate identifier (x or ProductWithIndex) becomes transparent. So in the body of the Join we have access to x, y and i without having to mess with any secondary types and unintuitive x.y syntax.

    At that point the query vs. method translation comes down to the same question as always: is the query inflexibility made up for by the value of clean expression of intent?

  • http://www.lostechies.com/members/chrismissal/default.aspx Chris Missal

    This made a lot more sense after I saw you code it. :)