Saturday, April 24, 2010

Do Unit Tests and Quality Code Result in Fewer Bugs?

At the end of my team's last release, I set out to prove that unit tests and quality code lead to fewer bugs.

Here are the tools I used.

Jira

The first thing to know is that the team I'm on keeps its features in Jira, and when we commit code to Subversion, we include the Jira ID in the comments.

Below, you'll see how this helped me relate code to features.

The second thing to know is when we find bugs in a feature during its development, in Jira we link those bugs back to the parent feature.

This allowed me to measure how many bugs had been reported for each feature at release.

SvnPlot

There's an open source project called svnplot that allows you to export the information stored in svn to a SQLite database. To do this, follow the instructions in Step 1 of the Quick Start.

You end up with a couple tables that include rows for each revision, including the comment and the files that were added, removed, and changed.

ANTLR

I used ANTLR C# grammar to help map the files changed to classes.

NCover

I used NCover to measure unit test coverage of the added and changed classes.

NDepend

And finally, I used NDepend to measure code quality of those classes, as well as to query the coverage metrics.

Results

This is the part where I'd love to show some fancy graphs proving my point. Unfortunately, the results weren't exactly conclusive.

I ended up dropping the idea of trying to correlate code quality to number of bugs altogether, because I just ended up with a mess.

The results for how unit test coverage related to bugs is sort-of presentable, at least, but they don't show a nice correlation:

Feature LOC Changed / Bug Average Coverage (%)
1 521 65
2 290 38
3 165 0
4 140 54
5 125 21
6 76 25
7 74 77
8 59 30
9 2 80

I'd like to think I'll try again another day, but given those results, I just don't know. I can come up with many reasons why things didn’t come out the way I'd hoped:

  • The amount of change for each feature was very diverse. Some features involved lots of code changes and others involved very few, so that the features may not really be comparable in this way.
  • Some features involved mostly changing UI code, where the coverage can't be high, because we don't currently have a good way to unit test our UI code.
  • Some features, especially Feature 9, had high test coverage, but the unit tests were very low on actual assertions.
  • There's just not enough data.

In the end, if my argument to the business is that unit tests and code quality are important because they reduce the number of bugs, results like these won't help me make my case.

So, if you know of more persuasive results—either your own or stuff out there in the literature—I'd love to hear about it :-)