What is the difference between an information need and a query



|Exam #1 v1.1 |LIBR 202 – Information Retrieval |

|Name: |Wilson |

Instructions:

Below are a set of questions, each contained in a box. Type your answer to the question in the box expanding as necessary. The answer does not need to be double space. This exam is worth 20 points.

Make sure your answers are coherent and in complete sentences. If I can’t understand it, I can’t give you credit. I expect the answers to be in your OWN words, NOT copy/paste from the course materials. I will not give credit for any answer that appears to be copied from any other source.

I like examples, especially ones from your experiences with the assignments; if they help you explain, then use them. Lastly, submit your completed midterm using a filename with last name followed by midterm, e.g., lastname_midterm.doc.

|1. |What is the difference between an information need and a query? Define each, and then explain their relationship. |1 pt |

| |

|Information needs may be a vague idea of what the user is looking for. A query is the exact search. The terms and words the user is looking |

|for to find the information that they need. A user may have a vague idea of their information need, putting it into precise search terms |

|formulating the query that finds information to suit their needs. |

|2. |What is the difference between information storage and information retrieval? Think of these as parts of an IRS and define each.|1 pt |

| |

|Information storage is how the information is organized and represents the objects in a collection. Information retrieval is how to locate |

|the objects being stored in the collection. An Information Retrieval System (IRS) organizes both parts of this making locating information |

|less complex. |

|3. |In the process of designing an information retrieval system, user needs are probably the most important aspect. Explain how to |1 pts |

| |address this aspect. | |

| |

|When designing an information retrieval system user needs are important because these are the people that will be using the system. Some says|

|to address this aspect are to first consider who the users will be. Also consider why they will be using the information retrieval system and|

|what types of questions they will be using the system to answer. By considering the user before beginning work on a database the designer can|

|better anticipate user needs and solve problems before they occur. |

|4. |What is an inverted file, and what is its function/purpose? |1 pt |

| |

|An inverted files is created automatically as records are added to a database. The inverted file takes most of the words entered into the |

|database and alphabetizes them, this list does not include “stop” words such as the, and, of, an. The reason databases doe this is because it|

|is faster to sort an already alphabetized list of words than to search each document for a word, when an individual searches. This file also |

|keeps track of the number of times each word is used. |

|5. |We have talked about the role of language as an elemental part of information retrieval. How does language support and detract |1 pt |

| |from information retrieval? Give one example of each. | |

| |

|Language is important in information retrieval because of the inconsistencies in its use. The use of natural language in IRS can result in |

|synonymy, where many different words are used to refer to the same concept. This detracts from its usefulness. For example a searcher may |

|search young adult for information, the database would not retrieve documents about teens, adolescents or minors which are all a part of the |

|same group of individuals just referred to in different ways. Using a controlled vocabulary helps to support information retrieval as |

|synonymy is prevented, by suing only one term to represent the concept of young adult. |

|6. |In the an IRS, we take information-bearing objects and represent them. We have to identify information about that object that we|1 pts |

| |think should be recorded. Pick an object and describe some “useful” information about it, and indicate briefly why that | |

| |information might be important to record and for whom. | |

| |

|Object- Shoes |

|Useful Information- |

|Color: Black |

|Style: Sandals, flip-flops |

|Size: 8.5 |

|Width: Wide |

|Price: $19.95 |

|Material: Leather |

|This information may be important to a shoe store. When returning or purchasing an item it is necessary for the store to keep an accurate |

|record for inventory. |

| |

|7. |What is a surrogate, and what function does it serve? |1 pt |

| |

|Surrogate records represent the items in the collection; they are not the actual item, as it could not be included in the collection. Instead|

|they are a stand in for the item. For example sometimes when searching a database for information an article appears but the database does |

|not have access to the full text document, however another database does. This record would be a surrogate leading the researcher to the |

|other database for the entire contents of the record. |

|8. |Define the following terms: database, fields, records, values. |1 pts |

| |

|Database is an electronic method of containing and organizing records of information about objects, using various fields containing object |

|information. |

|Fields are determined by how people will want to find information within a database then listed as a part of each record so searchers can |

|better locate information. An example of a field is a title, abstract, authors or publication date. |

|Records are the electronic information, composed of the fields about a specific object kept within the database. The collection of records |

|compiles together to form the objects within the database. |

|Values are the tings allowed in each field, for example a controlled vocabulary is a specific set of values for that particular field. |

|9. |Describe the difference between the responsibilities of a database schema designer and a database indexer. |2 pt |

| |

|The database schema designer is the individual responsible for designing the database. They are supposed to do many things to create a |

|stronger, reliable database. This individual tries to understand user needs. They determining the data attributes and what rules are |

|necessary for the data attributes. They also identify an controlled vocabularies used for different data vales. This individual or team of |

|individuals is also responsible for controlling the cost and performance of the IRs, identifying attributes that will be indexed and any |

|other tasks related to designing and creating the IRS. |

|The database indexer, also known as the record indexer, is responsible for creating the records within the database. They do this by |

|considering the object and the list of attribute rules and selecting the best descriptor. This individual needs to be accurate in entering |

|records and consistence in implementing the rules as they affect the users search outcome. |

|10. |Define aggregation and discrimination. And describe how you encountered these in the second assignment. |1 pts |

| |

|Discrimination is the process of narrowing the results of a search. This related to assignment two when measuring the recall of the results |

|when testing the database. |

|Aggregation is the process of expanding the results of a search, by broadening the possible attributes. This was seen in assignment two when |

|evaluating the precision of documents, when the individual determined how well they system retrieved relevant documents. Broadening the |

|search terms through aggregation increasing the number of document affecting the rate of precision. |

|11. |Define Phrase indexing and word indexing. Also, explain how to decide which to use. |1 pts |

| |

|Phrase indexing and term indexing are often terms used interchangeable. Phrase indexing indexes phrase as a whole instead of each individual |

|word. This type of indexing is used in database fields such as color so an individual can search the phrase “blue green” and receive result, |

|rather than just searching the words “blue” or “green.” It is important to remember when phrase indexing the words are kept together in a |

|single unit. Word indexing is indexing is done one word at a time. This allows a user to search fields within a database such as the title or|

|abstract for one particular word. |

|12. |Define subject indexing and full-text indexing. What would be a pro and a con for each in supporting findability. |2 pts |

| |

|Subject indexing is indexing using a controlled vocabulary or thesaurus to describe the information object. Subject indexing allows for |

|clearer results when using Boolean logic terms to search. Subject indexing con is that it relies on the indexer to have accurately |

|represented the aboutness of the document. |

|Full text indexing is used for electronic documents containing text. Most of the words indexed allow searchers to search words used in the |

|document to locate the document. Full text indexing relies on natural language not a controlled vocabulary to search. The con to full text |

|indexing is that when using Boolean logic terms to search it does so from the text of the document, instead of its aboutness. |

|13. |How is using Boolean logic operator AND and OR useful in a query? Define each and how they effect the result set. |1 pt |

| |

|Boolean logic operators assist searchers in discriminating and aggregating their search results. The Boolean logic operator “and”, results in|

|narrowing, discriminating the number of results found. The word and between two or more search attributes requires that both or all |

|attributes be present in the results. The Boolean logic operator “or”, results in aggregating the number of results found. The word “or” |

|between two or more search attributes, requires only one of the attributes be present in the results. |

|14. |Describe the hierarchical relationships used in a thesaurus. Be sure tp consider the role of specificity in your answer. |1 pts |

| |

|The thesaurus for a database includes the list of controlled vocabulary and its syntactic (interconnected) stretch. The syndic stretch allows|

|users to understand how different terms are used in a search system. The different relationships in thesauri include hierarchical, |

|associative and equivalent. Hierarchical relationships include Broader Term (BT) and narrower term (NT) relationships. A BT is one that is |

|more general than the entry term. A NT is one that is more specific than the entry term. An example would be the phrase Information Science. |

|The BT for information science is science. The NT’s for information science are computer science and library science. |

|15. |What are pre-coordinate and post coordinate terms? How would you decide to use one over the other? |2 pts |

| |

|Precoordinate terms are subject heading that form a longer expression. Postcoordinate terms are terms that describes the documents aboutness.|

|These terms are generally sort, single word or small phrases. The precoordinate terms help the indexer to determine what the document is |

|about and place the document next to similar ones on a book shelf. The post coordinate terms are ones that the searcher is likely to use when|

|attempting to locate the information. The indexer or information professions might use precoordinate terms to find similar information within|

|a database. When originally searching by a topic however they are more likely to search a post coordinate term until they find information |

|similar to what they are looking for. |

|16. |Define recall and precision assuming a closed system, and explain their relationship to each other in the result set. (Why |2 pts |

| |assume a closed system?) | |

| |

|Precision is the capacity at which the pre and post coordinate vocabulary rejects non relevant search results (Su, 1994). If the number of |

|results are few and the result are relevant to the search query than the vocabulary used can be considered adequate. Recall measures the |

|capability of the pre and post coordinate vocabulary provided by the database to retrieve all relevant articles (Su, 1994). If there are |

|various relevant results than the vocabulary provided is capable of aggregating like articles. For a closed system calculations would be |

|possible. In an open system that is constantly changing, it would be impossible to calculate the recall and precision of materials retrieved.|

| |

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download