Subject heading lists and thesauri in information …



Controlled Vocabularies

Definition and importance

Subject approaches in the electronic age have become a major way of finding information. With the massive increase in availability of recorded information, it becomes more and more evident that keyword searching alone will not suffice.

Virtually every word in the English language has more than one meaning or sense. Many words can be used as nouns, verbs, adjectives, or adverbs.

Vocabulary control tools are used to control the terms used in indexing and information retrieval. These are natural language tools. A classification scheme uses a system of notation, an artificial language, whereas for vocabulary control we need natural language representation.

Example,

330 (a notation for economy in DDC)

Economy ( a subject heading )

A controlled vocabulary is basically an authority list with a specific structure that is designed to:

• Control synonyms. e.g. Cars/ vehicles

• Distinguish between homographs. e.g. Mercury ( a planet)

• Link terms based on their meaning.

We can define a vocabulary control tool as an organized list of terms and phrases that can be used to assign subject descriptors to information resources, and also to search a collection by subject terms and phrases.

Tools for Vocabulary control

Subject heading lists and thesauri are therefore used as vocabulary control tools for indexing printed and electronic information resources.

Subject headings lists like the library of congress subject Headings (LCSH) and thesauri like the UNESCO thesaurus are examples of vocabulary control tools.

A subject heading list is an alphabetical list of terms and phrases, with appropriate cross references and notes, that can be used as a source of headings in order to represent the subject content of an information resource.

LCSH

LCSH is an example of a subject heading list; it is used quite widely as a controlled vocabulary for catalogues and bibliographies. LCSH is the most extensive list of subject headings.

LCSH contains the entry vocabulary of the Library of Congress catalogues. It is available in various formats including hard copy, CD-ROM and web. The latest edition is the 32nd, which contains over 317,000 headings and references.

LCSH is the most widely used tool for assigning subject headings to manual and machine-readable catalogues. It is also now being used to control vocabulary in the virtual library environment. For example, INFOMINE, an academic virtual library located at the University of California, Riverside.

The fundamental principles guiding the development of the LCSH are:

1. User needs : Access points and current usage of terms are the two most important guiding factors.

2. Uniform headings: Each subject is represented by one heading, and this is followed consistently. Synonymous terms and variant forms of the same heading are considered non-preferred terms, and appropriate references are created to facilitate access to the collection by those terms.

3. Specific and direct entry: According to LCSH the most specific term representing a subject is to be used as its heading, a subject is not entered under the broader or generic term that encompasses it. Example, enfant ( not children )

Each entry may be accompanied by all or some of the following:

• A scope note showing how the term may be used.

• A list of headings to which 'see also' references may be made.

• A list of headings from which 'see' references may be made.

For example,

Computer software

Here are entered works on computer programs

UF Software, Computer

Files, Computer program

RT Computer software industry

SA see also subdivisions "software" under subjects

NT Application software

Medical Subject Headings (MeSH)

MeSH was published by the National Library of Medicine as a thesaurus. Mesh is used for providing subject access points on every bibliographic record created at the National library of medicine, whether it be Medline, the library's catalog, or Index Medicus.

Thesauri

A thesaurus contains a controlled set of terms — from a particular area of knowledge — linked by hierarchical or associative relations; it also shows equivalence relations (synonyms) with natural language terms

A thesaurus is a tool containing a controlled set of terms arranged alphabetically, and various relationships among the terms are shown in order to facilitate indexing and retrieval, with appropriate cross references and notes.

Thesauri have been developed for specific subject fields with a view to bringing together various representations of a mapping for that term in the universe of knowledge by indication the broader , narrower and related terms. There are many thesauri for different subject areas.

Thesaurus of ERIC Descriptors

ERIC is an acronym for the Educational Resources Information Center, which is a national information system designed to provide access to a large body of education-related literature.

ERIC indexes journal articles, research reports, curriculum and teaching guides, instructional materials, computer files, and resource materials. These materials are indexed using terms from the Thesaurus of ERIC Descriptors.

To compare, subject heading lists were initially developed to be used in subject catalogues that could replicate the classified arrangement of library records, whereas thesauri have been developed in specific subject domains to facilitate indexing and retrieval.

General Principles for

Controlled Vocabulary Terms

Specific vs. General Terms

The use a specific or a general term depends upon the type of users who are intended to use the list, and upon the nature of the information resources.

Synonymous concepts

Two words mean the same thing there are multitudes of synonymous words and phrases. That means so close to the same thing. A cataloger has to choose one term to represent a subject, and make a cross reference from the other equivalent term.

Singular vs. plural

A major word form difference is singular versus plural. There is no rule on which form to use. Most of the time the plural will have the broadest coverage. E.g. we use "students" not "student"

Homographs

Homographs are words that look the same but have very different meanings. "Mercury" can be a liquid metal, a planet, a car; "bridge" can be a game, a structure spanning a chasm, or a dental device.

In a controlled vocabulary there must be some way to differentiate among the various meanings. Two common ways are either to use qualifiers ,such as Mercury (Planet) or to choose a synonym for the homograph to use as the preferred term.

Abbreviations and Acronyms

Traditionally, abbreviations and acronyms have either been spelled out, or not, depending upon the intended users of the controlled vocabulary and their expected knowledge. E.g. UN or United Nations

Popular vs. technical terms

When a concept can be represented by both technical and popular terminology, the creator of a controlled vocabulary must decide which will be used. For example, Medical Subject Headings (MeSH) uses "Neoplasm" where LCSH uses "Cancer." If the list is intended to be used for information packages that will be used by a specialized audience only, then specialized terminology is justified.

Number of terms assigned

There should be no arbitrary limit on the number of terms or descriptors assigned for a document. But for depth analysis, the number of terms necessary to cover all of the concepts should be allowed.

Concept not in controlled vocabulary

If a concept is not present in the controlled vocabulary, it should be represented temporarily by a more general concept. The new concept should be proposed as a new addition to the subject list or thesaurus.

Subdivision of Terms

Subdivisions are used in controlled vocabularies that precoordination terms. Among the uses of subdivisions are:

to separate by form (e.g., Chemistry—Dictionaries)

to show geographical or chronological limitations (e.g., Education -- Saudi Arabia --19th century)

Precoordination vs. Postcoordination

Index terms can be assigned either in a precoordinated fashion (i.e., the indexer constructs subject strings with main terms followed by subdivisions), or in a fashion that requires the searcher of the system to coordinate the terms (postcoordination).

When terms are precoordinated in the controlled vocabulary they are precoordnated by the cataloger or indexer, some concepts, sub concepts, place names, time periods, and form concepts are put together in subject strings.

In true postcoordinated systems, each concept is entered without any stringing together of subconcepts, place names, time periods, or form. Searchers must combine terms using Boolean techniques.

Relationships between terms in a thesaurus

Three general classes of fundamental thesauri relationships:

1.the equivalence relationship

The equivalence relationship is to be found between preferred and non preferred terms.

There could be several cases of synonymity:

- trms with different linguistic origin.

- popular names and scientific names,

- variant spellings, such as color/colour

- terms from different cultures, such as flats/apartments

- abbreviations and full names,

2. the hierarchical relationships such as UN or United Nations.

The hierarchical relationship is the basic relationship that distinguishes a broader term BT from a narrower term NT

Example

Capital markets

BT Financial markets

Financial markets

NT Capital markets

3. The associative relationships

An associative relationship is neither hierarchical nor equivalent, yet the terms involved are associated to such an extent that the link between them should be made explicit in the thesaurus. This relationship is represented by RT.

Display of terms in a thesaurus

Terms and their relationships in a thesaurus can be displayed in one of the following ways :

• SN : scope note

• USE: indicates that the following term is the preferred term

• UF: use for — indicating that the following term is the non-preferred term.

• BT : broader term

• NT : narrower term

• RT : related term.

Subject heading lists and thesauri in the organization of internet resources

While subject heading lists were primarily devised to assign subject headings in catalogues, many researchers have used them for organizing internet resources. Examples of some such efforts are given below.

INFOMINE

INFOMINE provides access to several thousand web resources including databases, electronic journal, textbooks and conference proceedings. It began in 1994 as a project of the Library of the University of California at Riverside. INFOMINE uses LCSH to index information resources.

Users can simply select a discipline and enter their search terms or phrases to conduct a search. The catalogue can also be browsed by author, title, keyword and subject. There is an option to browse by subject — if this is chosen, the user is taken to an alphabetical list of

Notes: the following slides are samples of ERIC thesaurus, from the electronic sources of IMAM UNIVERSITY LIBRARY

[pic][pic][pic]

[pic][pic][pic][pic][pic][pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download