Readable Regular Expressions Revisited


Many, many years ago (internet time), I proposed a fluent interface for composing regular expressions. People either loved the idea or hated it (or thought it was just ok). The intention was to try and tackle the opaqueness of regular expressions that might be embedded in your otherwise familiar C# source code.

I’ll confess I never used that approach in production code. It started as a thought experiment, and that’s as far as it went (for me, at least). However, I did pick up a great technique from the comments. William “OmegaMan” Wegerson suggested using RegexOptions.IgnorePatternWhitespace along with liberal usage of in-line comments. Here is a recent example from the fubumvc source:

const string propertyFindingPattern = @"
{              # start variable
(?<varname>w+) # capture 1 or more word characters as the variable name
(:              # optional section beginning with a colon
(?<default>w+) # capture 1 or more word characters as the default value
)?              # end optional section
}              # end variable"; 

Notice that the comments violate one of the main rules of good commenting: do not restate what the code says. Usually, someone reading your code is literate enough in the programming language that they can figure out “what” the code does, it just isn’t always clear “why”. But when it comes to regular expressions, I would guess that a majority of C# programmers need to look at a regex reference every time they try and decipher a pattern. Do them a favor and document what each part of the pattern does while you are writing it, since you’re probably looking at the reference already anyway. This should make it much easier for someone to follow (and modify) the code going forward. No fancy fluent interface required.

Big Visible TeamCity