6 search engines



6 search engines

The Web is a wild, disorganized (some would say verging on chaotic) place. You may have the address (URL) of a specific site you want to look for information at, but in many cases you will need to search for information, much like you would in a traditional library. In a traditional library you would go to a catalogue and conduct a search by title, subject or author. This chapter introduces search engines and how they work - probably the most powerful tool you can use to find information on the WWW. However, before launching into an actual search it is very important to plan and organize your search. This will allow you to be much more effective in searching for information while saving valuable time.

Planning the Search

In the process of getting information it is important to have a strategy. The tools for finding information on the Internet, referred to as search engines, will not guarantee a thorough search. As with most searches, there is an element of trial and error with a search engine and you may require a more rigorous approach. Bear in mind that research is not synonymous with “surfing or browsing the web”.

Before going online, it is necessary to clarify:

1. What you want, and,

2. How you plan to get it.

Identifying the proper sources of information for your source can clarify whether or not the Internet is for you. The Internet is appealing in that it is readily accessible and has something to say about pretty much everything. If you require a thorough knowledge of all available material on a subject, it is advisable to make the Internet one part of a much more comprehensive search that includes specialized libraries and experts in the field.

To avoid the simple trial and error approach to research, be prepared before signing online: Purpose should come before action. On the Internet, as with other sources of information, your point of entry into the system will be determined by your prior knowledge of the subject. Knowing as much as you can about what you need gets you closer to the information online.

Some basic principles to follow in organising your search are:

1. Have a focus, know what you are looking for. If you don’t know the information is out there somewhere, know the most efficient and effective ways to use your chosen search engines.

2. Formulate a strategy for your search. Seasoned researchers often use the most uncommon word in their search topic. Try different search engines until you find one you are comfortable with. Be prepared to switch to a different search tool if results are unsatisfactory. If you find yourself getting lost in the net, consider putting a time limit on your Internet search.

3. Don’t hesitate to ask for help: librarians at public and specialized libraries are information professionals; more and more of them are skilled database searchers (including the Internet).

Brief History of Search Engines

Search engines started from gophers (named after the University of Minnesota mascot where "gophers" first emerged in 1991). Gophers were non-HTML based and contained indexes of file titles and very brief descriptions or abstracts. Veronica was a program which searched all of "gopherspace". Gophers were about two years old when the WWW came about and the Internet emerged - including the use of hyperlinks, full-text searching, graphical browsers and Web search engines. WebCrawler is considered to be the WWW first search engine developed at the University of Washington in 1994. Within three years Lycos, Infoseek, OpenText, AltaVista, Excite, Yahoo!, HotBot and Northern Light all emerged. In addition, many of the ISPs and larger commercial providers have their own search engines (e.g., AOL, Netscape, Microsoft Network) or provide links to others. All the search engines are competing for users and you will find that the engines, while they are free to use, come loaded with advertising and add-ons (e.g., offers of free e-mail and home pages). Most Web search engines have been designed for the casual user and according to at least one author, very little advancement has been made in upgrading search capabilities

How Search Engines Work

There are four major functions which search engines perform:

1. "Spiders" or "Crawlers" - go out and actually find web sites and pages from the WWW. The more popular sites and those that have more links are crawled more thoroughly and more frequently than less popular sites. A search engine can be programmed for either breadth (main sites only) and depth (main sites and subsidiary sites) or both.

2. Indexing Program - indexes what the spiders find and other sites directly submitted to a search engine by web-page creators. Some engines purport to index all the words from every page while others have "stop words" which they ignore (e.g., small, common words - "a", "the", etc.). Some leave out words like "Web" or "Internet"; and many numerals are left out making it difficult to search on numbers. Most engines search on titles and URL's.

A search engine database is the information collected about sites from spiders and those web-pages directly submitted after being organized and indexed.

3. Retrieval Engine - algorithm and associated programming, devices, etc., which, upon a search request or query, retrieve material from the index. The retrieval engine searches an engine's database to identify and deliver records that match your query. A retrieval "algorithm" (formula) identifies the matching records and then arranges the retrieved items in a particular order for display to the user.

4. Graphical Interface - gathers information from the user to feed to the retrieval engine using HTML (HyperText MarkupLanguage). It also provides space for advertisers and links to add-ons, Help pages and other information.

Add-ons are additional features that appear on the page of the search engine that are not part of the search functions. These can include Web directories, featured sites and other tools.

Conducting a Search

In developing a good technique for research it is useful to know how the information is put together in your source and what kinds of parameters are useful. Different search engines on the Internet ask questions in different ways and are organised in different ways.

Few users will rely on just one search engine in their research, especially if a search comes up with no matches. Each search engine is put together differently and may emphasize certain topics in maintaining its inventory.

Search engines search on "keywords". To use a search engine effectively, make use of the vocabulary of your field of interest—your key to which words to try. Often the most obscure or uncommon term will lead you to the best shortlist of possible items. And a combination of uncommon words is even better. For example, if you are looking for information about old-growth forests in southwestern British Columbia, you might search for “spotted owl” and “British Columbia”. This will cull out sites in other areas, minimizing your exploring time.

To make your search even more specific, most search engines allow the use of special characters or operators. The most common ones are Boolean logic, Nesting (parentheses), Wildcards (truncation) and Nearness.

Boolean logic - is the capability of using operators such as AND, OR, and NOT to retrieve only those records that include a certain combination of terms. Many Boolean operators are similar for each search engine, but there may be some differences, so check out the Advanced Search or Help features for each search engine. Also remember to use CAPS when inserting Boolean operators.

AND - to specify that both words must be present. For example:

habitat AND restoration

will retrieve only those pages where both these words are present on the same page (same record).

OR - to specify that either word can be present. For example:

habitat OR restoration

would retrieve all of those pages (records) that have the word "habitat" plus all those pages that have the word "restoration".

NOT - used to exclude a word. For example:

habitat NOT aquatic

would retrieve all those pages that have the word "habitat" except those that have the word aquatic. All those with "aquatic" would be excluded.

Nesting capabilities (using parentheses)

For example:

(habitat OR ecosystem) AND aquatic

would retrieve all the records that have the word "aquatic" and also have either the word "habitat" or "ecosystem". Note that you would not necessarily get the same result by omitting the parentheses.

Other Boolean Characters

To include words: +habitat+restoration+aquatic

To exclude words: +habitat-aquatic

To group words "habitat restoration", habitat-restoration, habitat;restoration

Wildcards (truncation)

To search for wildcards:

- use an asterisk * to truncate a search phrase

Example: Mexic*

Note: at least three preceding letters are required. The asterisk will match from 0-5 additional lower case letters, but not capital letters or numerical digits. A search for engin* will not retrieve engineering. The asterisk * can also be used in the middle of a word (lab*r will retrieve both labor and labour).

Nearness (proximity)

Near (used in AltaVista) - specifies that the two terms must occur within ten words (default) of each other:

fish NEAR habitat

fish NEAR/5 habitat (would retrieve records containing fish and habitat no more than five words apart)

Popular Search Engines

There are many search engines. Each one is slightly different from the other. Most regular Internet users have their own preferences. You may find that some search engines are better to suited to specific tasks. Some simply search URLs; some “drill down” through several linked pages; some are “full-text” indexers; some index only Web while others include Usenet and other parts of the Net. You will also have your own personal search style and you need to find the search engines that are the best fit for you. In order to find the search engines that best meet your needs you might try the following exercise:

Go to the search engine page of a browser such as Netscape (try the “Net Search” button on the toolbar) and link onto one. Or try this directory of search engines plus tips on how to use them: engines

1. Choose one search engine and have a look at its home page and features. See if you like it but do not do a search yet.

2. Go back to the previous screen and link onto another search engine, then do the same for several more.

3. Once you have seen several search engines, visit each one again and search several topics of interest to you. At least one of these topics should be one you know something about to see if the search engine is missing important sources.

AltaVista -

• searches the greatest number of records (>140 million)

• has an advanced search mode

• searches to find matches and ranks them for output

HotBot -

• large database (>110 million Web pages)

• advanced search features

• lots of add-ons

Northern Light -

• >120 million Web-pages and >4 million proprietary articles (Special Collections)

• covers proprietary publications (Special Collections) as well as the Web

• organizes results into customized folders

Excite -

• indexed >60 million Web-pages

• full Boolean capabilities

Infoseek -

• >30 million Web-pages

• minimal Boolean functions

• relevance ranking

Lycos -

• indexed >35 million Web-pages

• Lycos Pro Search - more options (including full Boolean and proximity (nearness) features)

WebCrawler -

• 2 million Web-pages

• full Boolean features

Yahoo! -

• indexes >.5 million Web-pages

• considered to be a directory rather than a true search engine

• automatic link to AltaVista and other search engines

Others:

Google –

Snap! - home.

Magellan - Magellan.

World Wide Web Virtual Library - vl/

Meta-Search Engines

Meta-search engines search on a number of individual search engines for results. Some of the more popular ones include:

• Dogpile -

• Inference Find -

• The Big Hub -

• MetaCrawler -

• ProFusion -

• SavvySearch -

• Ask Jeeves -

• Mamma -

Searching Tips Summary

1. Plan your strategy before picking your engine

2. Decide what features would be helpful:

- Boolean operators?

- parentheses?

- nearness?

- truncation?

- phrases?

3. Find out which engines provide the features you want

4. Start specific, then move broader as needed. Get a feel for what's there, then modify your search strategy appropriately

5. Don't hesitate to try different engines and approaches. Try at least two engines unless the first one gives you exactly what you need. If you are looking for a specific fact or a specific page, use one search engine after the other until you find it or decide it's time to give up. If you are looking for background material and are not sure what you exactly want, use at least two engines.

6. For searching across several engines:

• use copy and paste to save and re-use your query (PC's: Control-C; Control-V)

• when using Boolean operators use UPPER CASE characters

• when searching for names, Capitalize

module 6

[pic]

key terms and concepts

1. information versus knowledge

2. search strategy

3. key word search

4. Boolean Logic

5. Meta-Search Engines

[pic]

study questions

1. Why would you use more than one search engine to look for information online?

2. Which search engine do you feel is the best to look for information about the environment? Give examples of the best search engines for different subject areas such as water quality, land use and habitat restoration.

3. You are doing a study of pollution in Burrard Inlet. Suggest four keywords that you might use for a search.

4. What Boolean operators could you use to find information on water quality in the Fraser River?

[pic]

assignment

1. Do a search for salmon habitat using the three different search engines listed below. Use different combinations of Boolean operators and other search techniques. Check the “search tips” or similar section for each engine you visit. Prepare a brief report about your results: how many “hits” did you get with each search engine (this is usually printed at the top of the results list)? How “on” did the first page of results seem? Try refining your search by using more terms. Tell us what terms you tried.

Type in the following addresses to try these search engines:

AltaVista:

Yahoo!

Google:

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download