Thought process: - University of Washington



Thought process:

What is information retrieval when documents become dynamic processes?

Anxiety

"What is a document? Rethinking the concept in uneasy times" Schamber

Roots of IR

One might trace the roots of information retrieval to the introduction of vertical files:

Ultimately, vertical filing, first presented to the business community at the 1893 Chicago World's Fair (where it won a gold medal), became the accepted solution to the problem of storage and retrieval of paper documents….The techniques and equipment that facilitated storage and retrieval of documents and data, including card and paper files and short- and long-term storage facilities, were key to making information accessible and thus potentially useful to managers. (Yates, 118 -120)

Definition of "document"

"With the appearance of writing, the document also appeared, which we shall define as a material carrier with information fixed on it." (Frants, Shapiro & Voiskunskii, 46)

"Document: a unit of retrieval. It might be a paragraphy, a section, a chapter, a Web page, an article, or a whole book." (Baeza-Yates & Ribeiro-Neto, p.440)

Fundamental Assumptions of IR

A fundamental assumption of information retrieval is that a "document" is a container of words.

Example: "Information retrieval is best understood if one remembers that the information being processed consists of documents." (Salton & McGill, 1983)

Example: "It is here proposed that the frequency of word occurrence in an article furnishes a useful measurement of word significance." (Luhn, 160)

Example: "A problem arising in this connection is the definition of the document to be used. There are many so-called full-text retrieval systems where the full text of the documents is used for indexing purposes….However, the computer storage of the full text of documents is expensive and is rarely possible except as a by-product of automatic typesetting operations. For many practical purposes, it is sufficient to use document excerpts for analysis, such as the titles and abstracts." (Salton & McGill, 1983)

Assumption of doing IR on the Web

"The advent of the World Wide Web has increased the importance of information retrieval. Instead of going to the local library to find something, people search the Web." (Grossman & Frieder, ix)

Graphic design as content

"graphic design can be content…user's experience a web-site with little or no 'text' per se.(Vartanian, I, 2001)

References:

Baeza-Yates, R. & Ribeiro-Neto, B. 1999. Modern information retrieval. Reading, MA: Addison-Wesley.

Frants, V.I., Shapiro, J., & Voiskunskii, V.G. 1997. Automated informaton retrieval: Theory and methods. San Diego, CA: Academic Press.

Grossman, D.A. & Frieder, O. 1998. Information retrieval: Algorithms and heuristics. Boston, MA: Kluwer.

Luhn, H. P. April, 1958. "The automatic creation of literature abstracts" IBM Journal.

Salton, G. & McGill, M.J. 1983. Introduction to modern information retrieval. New York, NY: McGraw-Hill.

Schamber, L. 1996. "What is a document? Rethinking the concept in uneasy times." Journal of the American Society for Information Science, 47(9): 669-671.

Yates, J. 2000. "Business use of information and technology during the industrial age" in A Nation transformed by information: How information has shaped the United States from Colonial Times to the Present, pps. 107 - 136. Edted by A.D. Chandler & J.W. Cortada. New York, NY: Oxford University Press.

--------------

Code behind example of

IR as ongoing activity. Focus on one of its assumptions: the Document Importance of document: container that is source of index terms. Document is not a precise term...IFLA concerns Focus of this essay on the mechanical imprecision of Document XML example. algorithms, scripts, text obscured in flash files.

Diagram: User - Interface - query term - index

Index - normalizing process - document

Assumption that document is a persistent resource.

Assumption that dynamic processes have text to normalize.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download