The application of information retrieval, natural language processing, and machine learning to finding documents relevant to lawsuits ("e-discovery") has become a multi-billion dollar industry. Lawyers and judges, traditionally late adopters of technology, have been struggling to adapt. Confusion, charlatanism, and misunderstandings are rampant, including in court filings.
Educational outreach by computer scientists and statisticians can help. However, there are things that the legal world would like to know that are simply unknown. In this talk I attempt to crystallize several of these issues as challenge problems spanning the areas of information retrieval, natural language processing, machine learning, statistics, human computer interfaces, and computational privacy. I will briefly discuss work with Doug Oard, William Webber, and Mossaab Bagdouri on two of these problems: sequential dependence in classifier evaluation, and optimal training/test splits. The others are yummy low hanging fruit.
Dave Lewis, Ph.D. (www.DavidDLewis.com) is a consulting computer scientist and expert witness working in the areas of information retrieval, applied statistics, and the evaluation of complex information systems. He has published more than 75 scientific papers and 8 patents, and is a Fellow of the American Association for the Advancement of Science. He has served as a consulting or testifying expert in patent litigation and on e-discovery issues in civil litigation, including consulting on predictive coding and statistical quality control in the da Silva Moore, Kleen Products, Actos, and FHFA cases, among others.