Rodney Waldhoff
HeyRod.com
Email

GitHub
StackOverflow
LinkedIn

Experimenting with Jester

(Back in 2003 I ran a moderately popular tech blog on the Radio UserLand platform. This is an archived version of a post from that blog. You can view an index of all the archived posts.)

Wednesday, 25 June 2003

In a previous post I alluded to the use of mutation testing to evaluate the completeness of a test suite, rather than relying upon pure test coverage metrics. In a comment to that post Adewale Oshineye suggested that I check out Ivan Moore's Jester, a Java/JUnit based mutation testing tool. I'd seen Jester before, but I'd never used it nor looked at it in much detail. The anecdote about what Jester uncovered in Bob Martin's and Robert Koss's test driven bowling score calculator example was certainly interesting, so this morning I bit the bullet and downloaded a copy.

Getting Jester up and running was a minor hassle (and it seems like much of that hassle could be alleviated), but I've written my share of open source projects with quirky configuration and installation, so I'll leave that alone. Once my files were placed in the proper position and after tweaking the python scripts to get them to run on whatever version of python I've got on my RedHat box, Jester ran slowly (waiting on javac) but well. It ran much faster once I figured out how to tell Jester to stop mutating my comments (set shouldRemoveComments=true in jester.cfg). The result was a basic HTML report like this one.

(On an unrelated note, is there something more or less equivalent to python -version?)

As an experiment, I took a small component (196 non-blank, non-comment lines of code spread over 34 methods) I knew to be well tested (100% coverage of statements and conditional expressions) and ran it through Jester. It found 21 mutations total, 2 of which didn't lead to a test failure. Those were (with the modified code in bold):

if(TRUE || MESSAGE_LOG.isDebugEnabled()) {
   MESSAGE_LOG.debug("Broadcasting " + msg);
}

and

List list = new ArrayList(_listeners.size() + 12);
list.addAll(_listeners);
list.add(...);

In the first case, believe it or not, I actually had unit tests that confirm that MESSAGE_LOG generates log events when set to the DEBUG priority, and no log events when set to higher than DEBUG priority. (I wouldn't normally do that, except this is the single log message in that component, and I really wanted 100% coverage. Besides, it wasn't that hard, I just added a mock Appender to that Category, and checked to see if a message was added or not.) Of course, both of these tests still pass, even without the isDebugEnabled call, since the debug method won't generate a log event when using a higher priority. Adding a test that fails as a result of this mutation isn't particularly useful, but it isn't difficult either--I just pass in a mock instance of the msg object and check whether or not the toString method is invoked. Not invoking toString is indeed the reason for this if(isDebugEnabled()) block, so maybe that's not such an odd test to have after all.

The second case is the kind of thing Ivan Moore describes as a "false positive" in his writeup on Jester. This code initializes that ArrayList to the precise size it knows will need. Allocating it a little bit bigger or a little bit smaller doesn't alter the functional behavior of the method (although it will be slightly less efficient). Maybe that indicates a premature optimization on my part, but it seems like a pretty small one. In any event I don't see any way to confirm that that List was allocated to precisely the right size without breaking encapsulation profoundly, so I think I'll let that one go.

I had hoped to run Jester on some larger, more complicated but less well tested code (3,287 nc,nb lines, roughly 77% coverage) to get a feel for how it works in a more useful scenario, but I've been unable to get it to complete a run on the this larger component. I may poke around with something in-between, but 4,000 lines is on the smallish side for the kinds of modules I'd want to run this on. I may have better luck mutating a single class at a time.

In short, I think Jester meets Sam Ruby's criteria for a successful open source project--it's a good idea with a bad implementation. I have some thoughts on how to improve that implementation (mainly obvious ones--e.g., use a ClassLoader and an actual parser, or consider mutating the byte code rather than the source) that maybe I'll cover in a later post. All in all, Jester is an interesting project, and like a lot of things, I wish it worked a bit better.