The Infrastructure for Intelligent Information Systems Department at the IBM Almaden Research Center, San Jose, CA 
has two openings for post-docs. Interested candidates should directly apply via email with a copy of their current 
resume to Howard Ho (ho@almaden.ibm.com) and Sriram Raghavan (rsriram@us.ibm.com):

** Post-doc in the Search and Analytics Group
The Search and Analytics Group has an opening for a post-doc to join its SystemT research project on declarative 
rule-based information extraction.  We are seeking candidates with a PhD in Computer Science and a proven record 
of scholarly publication in one or more of the following areas: database systems, information retrieval/extraction, 
natural language processing, and machine learning with applications to text analysis. Candidates will be expected 
to participate and contribute to both basic research and system building in areas such as (1) lineage and provenance
 tracking over complex extraction rule sets, (2) automatic and semi-automatic refinements of extraction rules, 
and  (3)  scaling complex information extraction tasks to very large corpora using a Hadoop-based compute platform.

** Post-doc in the Information Integration Group
The Information Integration Group has recently started a new with the goal of extracting, cleansing, and integrating 
data from multiple public data sources. We are building a scalable Hadoop-based system with components for
 information extraction, data cleansing and normalization, entity resolution, schema mapping and data merging, 
and temporal data analysis. We are seeking a one-year post-doc to participate immediately in our on-going 
research in one or more of the following research areas (1) developing efficient algorithms for the 
different components (e.g., entity resolution), (2) defining high-level abstractions and tooling for the new 
flavor of integration flow in Midas, (3) building a scalable infrastructure for the entire integration flow, 
and (4) applying and extending the resulting framework and system to various domains.