Projects at Cornell's Legal Information Institute of the ...



Projects at Cornell's Legal Information Institute of the Cornell Law School

Contact: Tom Bruce at  (tom@barratry.law.cornell.edu )

 

The Legal Information Institute at Cornell, a leading provider of public access to important legal materials is recognized worldwide as a pioneer in the application of technology to the processing, publication, and understanding of legal text.  The LII website offers, among other things, the decisions of the US Supreme Court, the United States Code, and the decisions of the New York Court of Appeals. At over ten million hits per week and (on rare occasions) as many as 5000 hits per minute, it is a popular and busy site with a large and loyal audience.  During the fall of 2003 we have at least two projects available for M.Eng. students.  We are also interested in hearing proposals for projects that fit in with our overall program.

 

 

Project 1 Recognition of Legal Citations

 

Recognizing and marking up citing references in legal text is an important technological building block for the LII.  Although citation is (theoretically, at least) formalized in a way that follows known rules, the diversity of works being cited, variation in citation styles, and a certain amount of error on the part of authors pose difficult problems for those who try to reliably and consistently extract it from legal texts, mark it up, or what have you.  While many citation-recognition problems can be solved with simple pattern matching, others demand more advanced, NLP-like techniques.  We are interested in developing software that can be applied to many items in our collections, particularly the opinions of the Supreme Court; ultimately, we want a generalized "citation markup library" that can be applied to many different collections of legal text.

Project 2  (text categorization):

 

Courts, particularly upper-level appellate courts like the US Circuit Courts of Appeal, rule on many different kinds of matters, ranging from disputes over land to intellectual-property and human-rights cases.  Many downstream services -- especially "legal news" services that are oriented to a particular topic like patent-law cases -- depend on accurate categorization of the texts of judicial opinions.  We would like, for example, to sort out all the copyright decisions handed down this week, or all the insurance cases from last month, or what have you.  Assigning category metadata to cases also aids in the construction of certain types of finding and navigational aids that make it easier for the public to find and understand the law.

 

When the court is one with a relatively small output of cases -- for example, the Supreme Court, which only hands down about 75 cases per year -- this is not a difficult thing for humans to do.  However, other courts are much, much busier -- issuing tens of thousands of opinions each year, as is the case with the US Circuit Courts of Appeal (USCA).  We would like to apply text-categorization technologies to this problem in what we imagine are increasing degrees of difficulty:

a) First, we wish to develop a simple current-awareness service or two.  This would involve sorting USCA cases into copyright- and non-copyright categories -- a simple binary division.

b) Second, we would like to diversify the current-awareness model to provide services for a number of different topics.

c) Finally, we would like to apply categorization across an entire array of approximately 100-150 categories with the aim of tagging each case with metadata descriptive of its classification.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download