The trials and tribulations of academic publishing – and fuzz testing

By Karen Barrett-Wilt

Barton Miller headshotThe story of fuzz testing begins on “a dark and stormy night. Really,” says UW-Madison Computer Sciences professor Barton Miller, who developed the methodology. Fuzz testing — now an accepted and widely used method of finding bugs in software that will make the software crash — had a rocky road to acceptance by the scientific community. 

Miller had a flash of inspiration and clarity on that dark and stormy night. He was logged into a Unix system on campus from his home computer during “a wild Midwest thunderstorm pouring rain and lighting up the late night sky.” Since it was 1988, he was using a dial-up modem, and the heavy rain created noise on the phone line. That wasn’t surprising, but the noise was interfering with his ability to type commands to the programs he was running. “It was a race to type an input and hit ‘enter’ before the noise overwhelmed the command,” says Miller. As a result of the noise on the phone line, standard and commonly used Unix programs were crashing. That was surprising. 

“The scientist in me said that we need to make a systematic investigation to try to understand the extent of the problem and the cause,” says Miller. So he did what any good professor would do: he created a project for his graduate Advanced Operating Systems course (CS736) so he and his talented students could work together on the problem. 

To find vulnerabilities in systems, they input random data, which he called “fuzz,” into standard software on several versions of Unix to try to make them crash. It was a simple and crude method of testing that happened to be really good at finding bugs.  And, unexpectedly, it caused a hostile backlash in the scientific community. 

When Miller and his students submitted their first paper on fuzz testing in early 1989, it was not well received. “We came across strong resistance from the testing and software engineering community. The lack of a formal model and methodology and undisciplined approach to testing seem to offend experienced practitioners in the field,” says Miller. Reviews of the paper were rude and insulting, including the suggestion that he should leave the field of computer science. 

To a new professor, this response was demoralizing and discouraging. Miller was not trying to supplant other methods, but rather to provide a new companion to those existing methods. “My response has always been simple: ‘We’re just trying to find bugs’,” says Miller. He continues, “Fuzz testing is not meant to replace more systematic testing. It is just one more tool, albeit an extremely easy one to use, in the tester’s toolkit.”

Almost two years later, after failed attempts to publish in other places, their paper finally was accepted in another journal. Miller and his students were freely sharing their work. “The source code for the tools and scripts, the raw test results, and the suggested bug fixes were all made public. Trust and repeatability were crucial underlying principles for this work.” Fuzz testing was open source before “open source” was a phrase.

Miller kept sharing his results and conducted another study with his students in 1995. Things had not improved; the fuzz techniques showed that, if anything, software reliability had gotten worse. Up to this point, fuzz testing seemed to be mostly ignored by the software community. 

Soon after this, he was on sabbatical at Stanford University and gave talks around Silicon Valley on his methodology and results to large audiences at many of the major computer companies. He saw his audience divide in two: craftspeople, who, taking pride in their work, embraced this tool as another way to expose flaws in their work and immediately fix them; and another group who felt fuzz testing just created more work and finding these flaws wasn’t their problem. They felt that issues didn’t need to be fixed until they became issues. 

Perhaps these talks were the tipping point. Miller’s work on fuzz testing became more and more accepted, and today, Miller says, “Fuzz testing has grown into a major field of research and engineering, with new results taking it far beyond our simple and initial work. Books have been published on the topic, and there is a steady stream of PhD dissertations coming out with new advances in the area. As reliability is the foundation of security, so has it also become a crucial tool in security evaluation of software.” 

Miller still remembers the sting of those early reviews by his peers. As a young researcher in particular, “Every rejection shatters our dreams of getting tenure. Every publication seems like we might get there. We all hate when our papers get turned down.” 

Last fall, Miller again offered fuzz testing as a project option to his CS736 students. He says, “With all the amazing advances that other people have made in the field, I wanted to see if our old-fashioned blackbox version was still effective and relevant.  It turned out that it was.” The results showed that they could still find a significant number of real-world bugs in this commonly used software. Again, he and his students turned that work into a paper. 

“Out of perverse curiosity,” Miller submitted that new paper to IEEE Transactions on Software Engineering (the journal that was the source of that early angst and hostile reviews) to see if they would now accept it. They did, with very pleasant reviews. “There is definitely some satisfaction and a feeling of vindication,” says Miller. 

The paper, “The Relevance of Classic Fuzz Testing: Have We Solved This One?” will appear in an upcoming issue of IEEE Transactions on Software Engineering. It is available for early access download now through IEEE’s Xplore digital library (https://ieeexplore.ieee.org/document/9309406) or through arXiv.org (https://arxiv.org/pdf/2008.06537). 

 

A graphical representation of Fuzz testing: “It is hard to even tell that this graph was supposed to be a pie graph, but through the combined efforts of three different rendering bugs, it came out looking like this. My code creates graphs like this one multiple times a second all while sorting and documenting the bugs that it finds. Crashes, empty renderings, timeouts, and even heap buffer overflows are all caught and documented by this application.” (Johnny Rockett on LinkedIn: https://go.wisc.edu/7n430b)