Refactoring with Iterators: Prime Factors

Andrew Woodward recently posted a comparison of his test-driven Prime Factors solution to one written by Uncle Bob. In the comments, someone suggested that Andrew use an iterator instead so I thought I’d give it a try.

First, let’s repost the original code:

private const int SMALLEST_PRIME = 2;

public List<int> Generate(int i)
{
List<int> primes = new List<int>();
int divider = SMALLEST_PRIME;
while (HasPrimes(i))
{
while (IsDivisable(i, divider))
{
i = AddPrimeToProductsAndReduce(i, primes, divider);
}
divider++;
}
return primes;
}

private bool IsDivisable(int i, int divider)
{
return i%divider == 0;
}

private bool HasPrimes(int i)
{
return i >= SMALLEST_PRIME;
}

private int AddPrimeToProductsAndReduce(int i, List<int> primes, int prime)
{
primes.Add(prime);
i /= prime;
return i;
}

By switching our method to return IEnumerable<int>, we can replace the primes
list with an iterator. We will also remove the AddPrimeToProducts
functionality from that helper method since we don’t have the list any
more:

public IEnumerable<int> Generate(int i)
{
int divider = SMALLEST_PRIME;
while (HasPrimes(i))
{
while (IsDivisable(i, divider))
{
yield return divider;
i = Reduce(i, divider);
}
divider++;
}
}

private int Reduce(int i, int prime)
{
return i / prime;
}

I think this is a good change for three reasons:

  1. There’s nothing about the problem that requires a List<int> be returned, we just want a sequence of the factors.
  2. AddPrimeToProductsAndReduce suggested that it had a side effect, but exactly what wasn’t immediately obvious.
  3. It’s much easier to see what values are being included in the result.

That said, I think we can clean this up even more with a second
iterator. Specifically, I think we should break out the logic for our
candidate factors:

private IEnumerable<int> Divisors
{
get
{
int x = SMALLEST_PRIME;
while (true)
yield return x++;
}
}

Which allows us to separate the logic for generating a divider from the code that consumes it:

public IEnumerable<int> Generate(int toFactor)
{
foreach (var divider in Divisors)
{
if (!HasPrimes(toFactor))
break;

while (IsDivisable(toFactor, divider))
{
yield return divider;
toFactor = Reduce(toFactor, divider);
}
}
}

We should also eliminate the negation by flipping HasPrimes to become IsFactored:

public IEnumerable<int> Generate(int toFactor)
{
foreach (var divider in Divisors)
{
if (IsFactored(toFactor))
break;

while (IsDivisable(toFactor, divider))
{
yield return divider;
toFactor = Reduce(toFactor, divider);
}
}
}

private bool IsFactored(int i)
{
return i <= 1;
}

This does introduce a (very) minor inefficiency in that the Divisors enumerator will MoveNext() one extra time before breaking out of the loop, which could be mitigated by checking IsFactored both before the foreach and after the while loop. Less readable, insignificantly more efficient…take your pick.

The other advantage to breaking out the logic to generate Divisors
is that we can easily pick smarter candidates. One option is to skip
even numbers greater than 2. An even better optimization takes
advantage of the fact that all primes greater than 3 are of the form
x±1 where x is a multiple of 6:

private IEnumerable<int> Divisors
{
get
{
yield return 2;
yield return 3;
int i = 6;
while (true)
{
yield return i - 1;
yield return i + 1;
i += 6;
}
}
}

Implementing this sort of logic in the original version would have
been much more difficult, both in terms of correctness and readability.

Related Articles:

About Keith Dahlby

I'm a .NET developer, Git enthusiast and language geek from Cedar Rapids, IA. I work as a software guru at J&P Cycles and studied Human-Computer Interaction at Iowa State University.
This entry was posted in Iterators, Refactoring. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • BW

    In the optimization at the end I would have used a for loop. Neither here nor there.

  • http://www.lostechies.com/members/dahlbyk/default.aspx Keith Dahlby

    That’s certainly true. For some reason I just don’t think to use for if the loop is infinite. But you do save 2 lines, so I guess it’s better. :)