Unit tests with log statements are a code smell
(Back in 2003 I ran a moderately popular tech blog on the Radio UserLand platform. This is an archived version of a post from that blog. You can view an index of all the archived posts.)
Friday, 2 May 2003
While this is not an earth shattering realization, I've come hold the opinion that log statements (log4j, logkit, java.util.logging, commons-logging, what have you) within unit tests are a code smell, perhaps universally.
While I'll sometimes add a few System.out.println calls to a unit test while I'm trying to diagnose a particular failure, configuring a full-blown logging setup within a unit test always seemed like more time and trouble than it was worth. From time to time I'll encounter a heavily logged TestCase in some code base I'm working with. The more I work with such TestCases, the more I find this to be an indication that something is not right.
Here's why:
- I find it hard to imagine a test first/test driven development approach that leads to log statements within unit tests (but I can imagine "test last" approaches that will). The presence of logging strongly suggests that the tested code was not developed in a test driven fashion.
- The role of logging frameworks and that of automated unit testing frameworks are at odds. Logging calls provide persistent, if only intermittently used, diagnostic and informational messages, typically intended for manual inspection . Automated unit tests are meant to be self-interpreting, success or failure should be obvious without manual inspection. The use of diagnostic or informational log messages within unit tests suggests your tests aren't sufficiently self-interpreting.
- Anecdotally, the objects being tested by these cases are brittle in the face of change. This may stem from poor factoring: there are too many subtle and perhaps unplanned interactions between methods, or methods aren't well focused enough to allow for orthogonal changes.
- Anecdotally, test failures remain difficult to diagnose and fix despite the log messages. This also stems from poor factoring: the individual test cases and assertions are too coarse grained to help identify the root cause of a test failure.