Best Practices: Metadata - LexisNexis



Best Practices: Metadata

David Darst, Director Digital Asset Management, LexisNexis

Mark Wasson, Sr Architect/Research Scientist, Global Solutions Development/New Technology Research, LexisNexis

Metadata is commonly defined as data about other data. When that other data is a text document such as a case opinion, brief or news article, then metadata in practice is information about that document, about specific text in that document or text extracted from that document that has been packaged in such a way that computers can easily use that information. At LexisNexis we use metadata to support search functionality as well as a number of products and features, ranging from LEXCITE to Search by Topic, from Core Terms to Company Dossiers.

Metadata about a document may include information about the document’s length, who the author is or when the document was written. Document summaries are another type of document metadata. Document sections, or segments, may be annotated with metadata. In case law opinions, segments may include Name, Cite, Court, Writtenby, Opinion and a number of others.

When metadata is used to support search functionality, it speeds up and enriches the search experience. Users often get better search results using metadata than they can from performing more complex searches because metadata helps the user limit the search to the most relevant information in the documents they’re searching.

Scenario 1: Find this case

Search on a collection without metadata

PARENTS INVOLVED IN COMMUNITY SCHOOLS

472,000 results

Search on metadata enhanced collection

name(PARENTS INVOLVED IN COMMUNITY SCHOOLS)

12 results

Scenario 2: Find this case

Search on a collection without metadata

127 S. Ct. 2738

36,900 results

Search on metadata enhanced collection

Search: cites(127 S. Ct. 2738)

1 result

Scenario 3: Find cases writtenby Justice Roberts

Search on a collection without metadata

Judge roberts OR Justice roberts supreme court

251,000 results

Search on metadata enhanced collection

court(supreme) and writtenby(roberts)

414 Results

Note:

o Searches run on collection without metadata is Google

o Searches run on metadata enhanced collection was LexisNexis Genfed Courts (2.6 Million Documents)

We have seen similar results for customers searching NEXIS news content. For example, for those users wanting to retrieve only a few solidly on-point news articles about their topic, testing showed that by limiting the query to Lead, answer sets on average were 75% smaller while precision (the relevance of those documents retrieved) more than doubled.

Segment-level metadata may appear to be rather basic, but it is an easy-to-use yet powerful tool for improving search results.

Classifying legal documents by topic is an area where we assign metadata at a document level to support improved search. For example, LexisNexis assigns a metadata tag that represents the “insurance law” topic to insurance-related cases. When a user decides to search an Area of Law – By Topic and picks insurance law, the search engine will check for the presence of appropriate metadata in the documents that it retrieves.

LexisNexis can also assign metadata to small pieces of text, such as words, names and citations. For example, for Lexcite we identify all the case reporter cites and their variants in some text, normalize them to a standard format, and associate links with them. When a user runs a search like

lexcite(127 s ct 2738)

the cite is normalized and searched against the normalized citations metadata. These behind the scenes matches explain how the text matched in the documents may look like 127 S.Ct. 2738, 127 S.Ct. at 2764-65, Id. at 2751 and 2007 U.S. Lexis 8670.

Whether it is at the document level, the segment level or assigned to individual terms, combining search with metadata helps users more easily find the information they need.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download