Kendrick Boyd: Precision-Recall Space and Empirical Algorithm Evaluation
Abstract:
ROC curves are widely used to represent the quality of medical screening procedures such as mammography. We show that in screening for diseases with rare prevalence, such as mammography for breast cancer, precision-recall (PR) curves have some significant advantages over ROC curves. Because of these advantages, PR curves, and the areas under them, are already the evaluation metrics of choice for other tasks characterized by low prevalence, such as information retrieval.
While PR curves are frequently used as a simple replacement for ROC curves, there are subtleties regarding PR curves that must be considered. It is already known that PR curves vary as class skew varies. What was not recognized before is that there is a region of PR space that is completely unachievable, and the size of this region varies only with the skew. We precisely characterize the size of the unachievable region and discuss its implications for empirical evaluation methodology in machine learning.
