Static Analysis and Generated Code

Posted by on in Blogs
In recent months, I've been thinking about the problem of static analysis in generated code. Static analysis means using tools like FxCop and NDepend (for .NET apps) lint (for C) and CodeHealer (for Delphi) to find potential problems in your source code. Generated code is code written not by a human being, but by a tool, such as generated classes for a Entity Framework or LINQ to SQL model, an import for a COM type, or code created by a form generator. Static analysis is intended to find code which is either incorrect or difficult to maintain. The issue exists in any environment where generated code and static analysis meet, but it's mainly a problem for me personally in C# and Delphi applications.

A Useful, But Imperfect, Tool

Static analysis can be incredibly useful. It finds errors in much the same way that compilers report hints and warnings, but at a significantly deeper level. Compilers typically report a hint or a warning only in cases where the compiler is completely certain that there is a problem with the code, like a completely unused variable. Static analysis tools, on the other hand, can report errors in cases where, for example, the code itself might function, but appears to be too complex to be maintainable. Obviously, there is a gray area between maintainable and unmaintainable, and every static analysis tool I have ever tried has returned false positives. FxCop, for example, will complain that an internal type is never instantiated if the only time it is ever instantiated is inside of a LINQ to Entities query; it doesn't seem to be able to "see into the Expression."

For Code Written By Humans Only

Static analysis is close to useless for generated code, because it is generated by a tool, tends to always function correctly, and is presumably never maintained by humans, at least directly. But generated code tends to really set off alarms on static analysis tools, due to high line counts, cyclomatic complexities, and numbers of parameters, no unit tests, unusual type and member names, etc. In other words, in generated code functionality is much more important than maintainability. But in non-generated code, maintainability is at least as important as functionality. The best solution is to convince your static analysis tool to ignore the generated code altogether, but this is often easier said than done.

When I run FxCop or NDepend against the source code for our commercial applications, for example, they report a large number of errors, with the majority coming from generated code. The reality is that no tool I have used can consistently distinguish between unmaintainable code written by code generators and unmaintainable code written by programmers, automatically. Therefore, in order to get the maximum benefit from static analysis, we have some additional work to do in order to help the tools look at only the code where their analysis will return useful information for developers.

Excluding Generated Code From Static Analysis

The .NET framework includes an attribute, GeneratedCode, which is intended to tell tools such as FxCop to ignore parts of your code. Also, FxCop has its own attribute, SuppressMessage which instructs the tool not to report individual messages in specific cases, and individual tests can be turned off. CodeHealer can be told to suppress a certain number of messages, and you can turn individual tests on and off.

Even with these tools, however, the line between generated and non-generated code is not always clear. FxCop, for example, looks only at compiled assemblies, not at source code. If you have a partial class containing both generated code and human-written code, it's important to recall that FxCop cannot tell the difference between the parts of the class which came from the generated code file and the parts which came from the non-generated file. The distinction simply does not exist in the compiled assembly, because "partial" classes are combined at compile time by the C# compiler. So you're left with the choice of putting the GeneratedCode attribute on the entire class, causing static analysis to miss code you wrote, or not putting it on the entire class, resulting in static analysis errors from the generated code. The best solution is probably to keep any code you write as part of a partial class as simple as possible, such as calls into another class which does the real work. Considering that partial classes are typically used for things like forms, or LINQ to SQL types, this is a good practice anyway.

Code generation can also work by producing IL or assembly directly, without a source file at all. Obviously, tools which look only at source code, like CodeHealer or Microsoft StyleCop will miss this sort of code (which is a good thing) by their very design. But tools which reflect into compiled code will see this sort of code as no different from any other code in the assembly unless they are specifically designed to exclude it.

For example, when you have an anonymous type with a large number of members in C#, the compiler will generate (in IL) a constructor with an equivalent number of arguments. NDepend flags these as excessive numbers of arguments, and, indeed, it would be correct if this were a real method which I had written.

Improving Heuristics to Avoid False Positives

NDepend offers an additional means of filtering data. The tool has a domain specific language for querying your source code, called CQL, and you can use this to filter the specific areas analyzed by NDepend. I can modify the NDepend rules by editing the CQL, so I can, for example, exclude the cases I've just described (generated constructors with large numbers of parameters) by changing the default rule for "too many arguments," which looks like this:

// <Name>Methods with too many parameters (NbParameters)</Name>
WARN IF Count > 0 IN SELECT TOP 10 METHODS WHERE NbParameters > 5 ORDER BY NbParameters DESC
// METHODS WHERE NbParameters > 5 might be painful to call and might degrade performance.
// You should prefer using additional properties/fields to the declaring type to handle
// numerous states. Another alternative is to provide a class or structure dedicated to
// handle arguments passing (for example see the class System.Diagnostics.ProcessStartInfo
// and the method System.Diagnostics.Process.Start(ProcessStartInfo))
// See the definition of the NbParameters metric here

to this:

NbParameters > 5
BY NbParameters DESC

...and the false positives go away. Unfortunately, this only fixes one test. Similar CQL appears in other default NDepend metrics, like this one:

// <Name>Quick summary of methods to refactor</Name>
WARN IF Count > 0 IN SELECT TOP 10 METHODS /*OUT OF "YourGeneratedCode" */ WHERE

// Metrics' definitions
( NbLinesOfCode > 30 OR //
NbILInstructions > 200 OR //
CyclomaticComplexity > 20 OR //
ILCyclomaticComplexity > 50 OR //
ILNestingDepth > 4 OR //
NbParameters > 5 OR //
NbVariables > 8 OR //
NbOverloads > 6 ) //

// Here are some ways to avoid taking account of generated methods.
!( NameIs "InitializeComponent()" OR
// NDepend.CQL.GeneratedAttribute is defined in the redistributable assembly $NDependInstallDir$\Lib\NDepend.CQL.dll
// You can define your own attribute to mark "Generated".
HasAttribute "OPTIONAL:NDepend.CQL.GeneratedAttribute")

These must all be fixed individually. On the other hand, at least they can be fixed by easily changing the heuristics of the rule, instead of by excluding specific cases, as with many other static analysis tools.

One Time Pain

The real win from static analysis tools comes from integrating them into an automated build process. Tools like FinalBuilder and MSBuild can easily be configured to run all of the tools mentioned so far. This allows you to fail the build if potentially unmaintainable code is checked in. The first time you use any static analysis tool, you will probably need to spend a substantial effort in order to get the project to the point where it will not fail the build with existing code. This may involve turning off individual rules (you can gradually turn them on later), fixing specific violations, and excluding code such as generated code. You will then need to integrate the tool into your automated build process. This is, in all honesty, a fair bit of work, but it's really a one-time pain. After that, the analysis will pay you back for your effort, day in and day out.


  • Guest
    Patrick Smacchia Wednesday, 16 September 2009

    We (the NDepend team) have some plans to provide deeper exclusion mechanism with NDepend and CQL.
    Basically, we thought about having a customizable range of dedicated CQL statement like...
    EXCLUDE METHODS WHERE FileNameLike ".designer.cs"
    EXCLUDE METHODS WHERE IsGeneratedByCompiler
    EXCLUDE TYPES WHERE HasAttribute "GeneratedAttribute"
    This will let user finely do the distinction between human and generated code and will put an end to false positive dilemma on generated code.
    This feature should be released in first half of 2010.

  • Guest
    Bryce Tuesday, 24 August 2010

    I do not see a response to Craig's question. I am also interested in the same point.

  • Please login first in order for you to submit comments
  • Page :
  • 1

Check out more tips and tricks in this development video: