PL Seminar: Data-Delineation in Software Binaries and its Application to Buffer-Overrun Discovery

Monday, May 11, 2015 -
4:00pm to 5:00pm
3310 CS

Speaker Name: 

Evan Driscoll

Speaker Institution: 

GrammaTech, Inc.




Abstract—Detecting memory-safety violations in binaries is complicated by the lack of knowledge of the intended data layout, i.e., the locations and sizes of objects. We present lightweight, static, heuristic analyses for recovering the intended layout of data in a stripped binary. Comparison against DWARF debugging information shows high precision and recall rates for inferring source-level object boundaries. On a collection of benchmarks, our analysis eliminates a third to a half of incorrect object boundaries identified by an IDA Pro-inspired heuristic, while retaining nearly all valid object boundaries.

In addition to measuring their accuracy directly, we evaluate the effect of using the recovered data for improving the precision of static buffer-overrun detection in the defect-detection tool CodeSonar/x86. We demonstrate that CodeSonar’s false-positive rate drops by about 80% across our internal evaluation suite for the tool, while our approximation of CodeSonar’s recall only degrades about 25%.

(Joint work with Denis Gopan, Ducson Nguyen, Dimitri Naydich, Alexey Loginov and David Melski.)