A Test of Internet Search Engines: once again, with new ...

A Test of Internet Search Engines: once again, with new players!

Bipin C. Desai

Department of Computer Science Concordia University

1455 de Maissonneuve Blvd. West Montreal, Quebec Canada H3G 1M8.

email: bcdesai@cs.concordia.ca

Number of URLS Numbe rof URLS

AltaVista Search: Bipin (AND) Desai

8

7

6

5 Hits

4

Duplicates

Mis-hits

3

2

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Page number of search

Google Search: Bipin (AND) Desai

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Page number of search

Hits Duplicates Mis-hits

1

Number of URLs Number of URLs

Hotbots Search: Bipin (AND) Desai 12

10

8

Hits

6

Duplicates

Mis-hits

4

2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Page number of search

Lycos Search: Bipin (AND) Desai

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Page number of search

Hits Duplicates Mis-hits

Number of URLs

140 120 100 80 60 40 20

0 AltaVista

Search: Bipin (AND) Desai

Hits Duplicates Misses

Google

Hotbots

Search Engine

Lycos

Figure 1: Summary 2

The Discovery Problem

In recent years, the Internet has become extremely popular throughout most of the world. Computers, along with network facilities, have found their way into many aspects of our lives and the Internet is becoming a well accepted repository of information. As such, an increasing number of research institutes, universities and business organizations are currently providing their reports, articles, catalogs and other information resources on the Internet using the WWW (World Wide Web) ([BLT90], [BLT93]). The Web has become the accepted norm of disseminating and sharing information resources in hyper-media.

0.1 First Generation Search Engines

Between June 3, 1995 and June 15, 1995 the pioneering search engines ALIWEB[ALIW], DACLOD[DACL], EINet Galaxy[EINE], GNA Meta-Library[GNAM], Harvest[HARV], InfoSeek[INFO], Lycos[LYCO], Nikos[NIKO], RBSE[RBSE], World Wide Web Catalog[WWWC], WebCrawler[WEBC], WWWW [WWWW], Yahoo[YAHO] were used in a series of tests to find URLs by the author of paper. Unfortunately, the search engines did not have a method to be context sensitive the searches were made using a target search string(s): Bipin (AND) Desai. At that time, there were 24 known URLs with the string. These URLs are listed in [BCDT1]. The results obtained are given below in table [?] giving the number of hits, mis-hits and misses. The misses in the result for the manual systems, some of which depends on registering the resources, indicates that the resources have not been registered. Results may not be identical if the tests are repeated. This would be due to the possible discovery or registering of the missing documents. All documents in the list above existed well before the test date.

Many of the pioneering indexing systems, existing in mid 1995, were no longer accessible when a second series of tests were tried in the fall of 1997. . In the meantime, a number of new systems, such as Altavista, OpenText, Hotbots etc. had emerged. Many workers in the domain of digital virtual library feel that these newer systems have addressed many of the issues we raised in ?a href= faculty/bcdesai/cindi-system1.1.html?Cindi System?/a?, still under active development at Concordia.[1]?p?

The second series of tests was done in September thru October 1997 to find the number of

3

Search System

Number of Number of Number of Number Hits Duplicates Mis-hits missed

Aliweb

none

-

DA-CLOD

none

-

EINet

6

0

GNA Meta Lib. none

-

Harvest

none

-

InfoSeek

7

0

Lycos

231

2

Nikos

none

-

RBSE

8

-

W3 Catalog

none

-

WebCrawler

7

3

WWWW

2

0

Yahoo

none

-

-

24

-

24

4

22

-

24

-

24

0

17

222

17

-

24

8

24

-

24

0

20

0

22

-

24

Table 1: Search statistics for using search the term Bipin (AND) Desai

4

relevant documents that would be able to located by these current search engines and evaluate the usefulness of the index entries so retrieved. Relevance of a document could be judged easily once the ?a href= faculty/bcdesai/search-oct97/whereisDesai.html?target set?/a? is known. We repeated the test performed in 1995 with the same search words. At the time of the test, some 325 URLs were known to contain the words "Bipin" and "Desai". These represents Web documents pertaining to the author. The complete list of these URLs is given ?a href= faculty/bcdesai/searchoct97/whereis-Desai.html?here?/a?. ?p? The first test, given in Table ?? below, was done on the following search engines:?p? AltaVista[ALTA], Excite[EXCT], Infoseek[INF1], Lycos[LYC1], Hotbots[HOTB], OpenText[OPEN], Yahoo[YAH1].

For Web search, Yahoo appeared to use the AltaVista engine and its database and produces almost identical result; hence we have given a single result for both search systems in Table 2. ?p?

As in the 1995 sereies of tests, we have given the result by noting the number of hits produced, the number of duplicates, number of mis-hits and and the number of relevant documents not listed in the result; we have also included a column for the number of URLs which are no longer valid. The duplicates are either the same document being served from two sites or same document listed twice. The latter errors seem to have been corrected in most search engines and they have eliminated such obvious duplicates.?p?

The document missed could be due to the approximations used by engines such as AltaVista when it finds a large number of hits. However, the fact that these search engines could not locate all document indicates the inherent problem of isolated URLs.

The bigger problem is the lack of selectivity and a measure of usefulness of the documents found by the search engines. We have collated the result by follwing the trail of "next" set of URLs and these could be viwed by pressing on the number of hits for each search engine in Table 2. A glance at the abstract or summary presented by the search engine indicates the they are not very revealing and except for the most pedestrian need, following the pointers would result in a drain of the searchers time.?p?

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download