Why Won't Visual Studio Step Into This Code?

Posted by on in Blogs
I helped another developer debug an interesting problem this morning. Let's see if you can spot the problem. The code in question looked something like this simplified version containing only enough code to show the problem:

public void Execute()
{
DoStuff(); // breakpoint 1
}

public IEnumerable<Coordinate> DoStuff()
{
LaunchMissiles(); // breakpoint 2
// rest of method here
}


Note that the result of the function DoStuff is not used by Execute. That result actually exists only for testing purposes; it's essentially a log we use to monitor changes the method makes to external state. The unit tests in question passed, so it was clear that DoStuff worked correctly, at least in a test context. The problem was that when the code ran outside of a test context (i.e., in the real application), the DoStuff method would never run. The debugger would stop at breakpoint 1, but not at breakpoint 2, but only in the "real" application. Similarly, attempting to step into DoStuff would not actually go into the method body. If we debugged the unit tests, the debugger would stop at both breakpoints, and the method worked.

Can you spot the bug?

Perhaps it would help if I showed more of the method:

public IEnumerable<Coordinate> DoStuff()
{
LaunchMissiles(); // breakpoint 2
yield return CurrentCoordinates();
}


Now do you see the bug? Remember, the unit tests pass. There is no special knowledge about our application needed to see the problem here; all of the information required to spot the bug is in the code snippets above. The problem is a code bug, not a setup or configuration issue.

Perhaps it would help if I showed you a version of DoStuff which "works."

public IEnumerable<Coordinate> DoStuff()
{
LaunchMissiles(); // breakpoint 2
return new List<Coordinate> { CurrentCoordinates() };
}


With this version, both the unit tests and the "real" application work correctly.

The Solution


At first glance, this might seem puzzling. I've changed only the last line, and both of those versions appear to do almost exactly the same thing. Why is the behavior of the breakpoint at the previous line different?

The answer is that using yield return causes the C# compiler to change the entire method, not just that single line. It surrounds the code with a state machine containing the rest of the method body. Importantly, the iterator returned from the "yield return" method is entirely lazy; it will not run the method body at all until you attempt to iterate the result of the method. But Execute ignores this result, so the method never runs at all.

Discussion


Some languages, like Haskell, go to great lengths to segregate expressions and side effects. C# isn't one of them, but even so it's common to try to improve quality by doing so. Eric Lippert, a member of the C# compiler team, once wrote:
I am philosophically opposed to providing [an IEnumerable<T>.ForEach() method], for two reasons.

The first reason is that doing so violates the functional programming principles that all the other sequence operators are based upon. Clearly the sole purpose of a call to this method is to cause side effects.

The purpose of an expression is to compute a value, not to cause a side effect. The purpose of a statement is to cause a side effect.

It is clear that causing side effects could cause an expression to change in mid-computation. This is problematic for debugging and quality, especially if some of the evaluations are lazy. But as this example demonstrates, the opposite is also true: Adding expressions to a computation can change the side effects, too.

Comments

  • Guest
    Sebastian PR Gingter Wednesday, 30 March 2011

    Hi Craig,

    actually, the yield way may potentially be a LOT more performant than the direct execution way.

    Imagine, you only need the first 10 results, but the method would calculate 10.000. That would generate 9.990 useless results, costing performance and memory, that will be thrown away afterwards anyways. No need to calculate them then.

    Also in LINQ to Sql or EF (or any other tool, that will serialize your expression and send it over some wire to be executed remotely) this can save a lot of performance.

    Or just imagine a potentially endless iterator, that will return the next prime number with each yield. Calling ToList() would be a never ending thing, but you usually only pick a certain amount of them.

    Of course you're right, that this can (and eventually will) cause some side-effects, but there are a range of very valid reasons for the yield statement and the fact that the method will only be executed when you actually iterate over the sequence.

  • Please login first in order for you to submit comments
  • Page :
  • 1