Biographical Sketch: Michael J



Biographical Sketch: Michael J. Cafarella

A. Professional Preparation

Brown University Computer Science A.B., 1996

University of Edinburgh Artificial Intelligence M.Sc., 1997

University of Washington Computer Science M.Sc., 2005

University of Washington Computert Science Ph.D., 2009

B. Appointments

Starting December, 2009

Assistant Professor, Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI

C. Publications

Relevant Publications:

1) Data Integration for the Relational Web. Michael J. Cafarella, Alon Halevy, Nodira Khoussainova. PVLDB 1(2), 2009.

2) Uncovering the Relational Web. Michael J. Cafarella, Alon Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu. Proceedings of the Eleventh International Workshop on the Web and Databases (WebDB), June 2008. Vancouver, Canada.

3) WebTables: Exploring the Power of Tables on the Web Michael J. Cafarella, Alon Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu. Proceedings of VLDB 2008, August 2008. Auckland, New Zealand.

4) Extracting and Querying a Comprehensive Web Database. Michael Cafarella. Proceedings of the Conference on Innovative Data Systems Research (CIDR) 2009. Asilomar, CA.

5) Navigating Extracted Data with Schema Discovery. Michael J. Cafarella, Dan Suciu, Oren Etzioni. Proceedings of the Tenth International Workshop on the Web and Databases (WebDB), June 2007. Beijing, China.

Other Publications:

1) Ontology-driven, Unsupervised Instance Population. Luke K. McDowell and Michael Cafarella. Journal of Web Semantics 6(3): 218-236, 2008.

2) Structured Querying of Web Text: A Technical Challenge. Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko. Proceedings of the Conference on Innovative Data Systems Research (CIDR) 2007. Asilomar, CA.

3) Open Information Extraction from the Web. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), January 2007. Hyderabad, India.

4) A Search Engine for Natural Language Applications. Michael J. Cafarella, Oren Etzioni. Proceedings of the 14th International World Wide Web Conference (WWW 2005).

5) Web-Scale Information Extraction in KnowItAll: Preliminary Results. Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates. Proceedings of the 13th International World Wide Web Conference (WWW 2004).

D. Synergistic Activities

Michael Cafarella’s main research focus is in the area of Web information extraction and integration. His recent WebTables project obtained more than 125M distinct databases by crawling the Web for HTML tables used in a database-like setting; the result was a database corpus more than five orders of magnitude larger than any previous effort. His published Octopus system allows users to easily combine and integrate data from these extracted Web sources. In both cases, his work has solved novel data management problems prompted by the growth of the Web.

Dr. Cafarella is also the co-founder of the Nutch and Hadoop open-source projects. Nutch is an open-source search engine. Hadoop is a suite of cluster-based data management tools, including the most widely-used implementation of the MapReduce distributed programming framework. Hadoop is used in both academia and industry, including MIT, Stanford, CMU, Yahoo!, Facebook, and .

E. Collaborators and Other Affiliations

Non-advisor collaborators:

Michele Banko (University of Washington), Nodira Khoussainova (University of Washington), Jayant Madhavan (Google, Inc.), Luke McDowell (US Naval Academy), Christopher Re (University of Wisconsin, Madison), Daisy Zhe Wang (UC Berkeley) Eugene Wu (MIT), Yang Zhang (MIT)

Graduate Advisors:

Oren Etzioni, Computer Science and Engineering, University of Washington

Dan Suciu, Computer Science and Engineering, University of Washington

Alon Halevy, Google Research, Google, Inc.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download