Massachusetts Institute of Technology



The Re:Search Engine –

Helping People Return to Information on the Web

|Jaime Teevan and David Karger |

|Massachusetts Institute of Technology, CSAIL |

|32 Vassar Street, G472 |

|Cambridge, MA 02139,USA |

|Tel: 1-617-253-1611 |

|{teevan, karger}@mit.edu |

ABSTRACT

Re-finding information is consistently cited as a problem on the Web [5]. One reason re-finding on the Web is difficult is that while people rely on a considerable amount of context to return to information (e.g., the original path taken to it), the Web makes no guarantee that the context will remain static. Online news sites change when new stories are written, and Web sites change as their hosts edit them, and search results change as search engines update their indices to reflect the current state of the Web. This paper describes the Re:Search Engine, a system built to help people return to information in the dynamic environment of the Web. The Re:Search Engine makes it easy to find previously located search results, even when the result list for a query has changed. Because people tend not to remember much of what is presented in a search result list, when a person repeats a query, the Re:Search Engine is able to preserve what the user remembers about the original results while still presenting new information.

ACM Classification: H5.2 [Information interfaces and presentation]: User Interfaces – Graphical user interfaces. H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – Search process, Relevance feedback.

General terms: Design, Experimentation, Human Factors

Keywords: Re-finding, personalization, information management, implicit feedback, user profiling, dynamic information.

People Rely on Consistency When Searching

People rely on consistency in their information environment when searching for information, and are particularly likely to expect consistency when searching for previously encountered information. Consider the example search shown in Figure 1. If Connie, while looking to purchase a Global Positioning System, found several systems she liked through a search for “GPS”, she would expect to be able to use the same query to locate the exact same systems again.

The importance of consistency has been revealed through several previously reported motivating studies conducted in order to gain insight into how people return to information. One, a modified diary study of fifteen computer science graduate students performing personally motivated searches in their email, their files, and on the Web, found that even among this technically savvy population, participants preferred navigating to what they were looking for along known paths over jumping directly to it [16]. Similarly, the other study, a naturalistic study analyzing instances of re-finding mined from Web pages containing the phrase “Where’d it go?”, found that when people expressed difficulty re-finding information, they often explained what they were looking for by describing the path they took to originally find it, but were relatively unlikely to describe the temporal aspects of their original encounter with the information [18]. The results of these studies are consistent with what has been found by others [4].

…But the Web changes

Despite the importance of consistency in re-finding, information on the Web frequently changes. Search results, often an important step in the original access path to a piece of information, change regularly as search engines update their indices to reflect the current state of the Web. This can be illustrated through analysis of the top ten results for ten queries issued to Google and tracked over the course of a year. On average, only 2.7 of the results remained in the top ten after a year; three results disappeared from the list within the first month. Thus, if Connie wanted to revisit two of the top ten GPS systems she found during her original search for “GPS”, she would have a 51% chance of not being able to locate one of them after a month. Attempts to improve retrieval quality through personalization or collaborative filtering are likely to increase the frequency of search result changes.

Although Web search engines have traditionally sought to return the search results that are the most relevant to a query without consideration of past user context, some recent search systems, such as A9 [1], allow users to mark pages of interest to return to later. However, people are unlikely to employ keeping strategies that require active involvement [9]. Some search engines have also allowed people to explicitly search information they’ve seen before [1, 6], but these systems do not maintain consistency in search result presentation, requiring the user to take a different path to the same information. Information systems that do maintain consistency allow their users choose to interact with a cached version of their information space [7, 13]. While employing similar methods to keep the results for repeat queries static would make re-finding simpler, it would deny users the opportunity to discover new information. For example, if Connie re-issues her “GPS” search, in addition to re-finding the systems she liked before, it is possible she would also be interested in learning about newly available systems. Even though changes to search results associated with a query can potentially hinder returning to previously viewed information, they benefit users by providing new information.

This paper presents a solution to the problems people encounter while interacting with dynamic information: the Re:Search Engine. The architecture of the Re:Search Engine is described and motivated through by a study of what people remember about the search results they interact with. The resulting system allows people to interact in a dynamic information world without losing the context they develop through such interaction.

Solution – the Re:Search Engine

The Re:Search Engine addresses the problem of search result consistency by seamlessly integrating old relevant information with new relevant information. The engine consists of a Web browser toolbar plug-in that interfaces with a preexisting search engine (e.g., Google or Yahoo!). When a person issues a query that they have issued before, the Re:Search Engine fetches the current results for that query from the underlying search engine and merges the newly available information with what the user remembers about the previously returned search results in a seamless fashion. Because people tend to remember very little about the search result list they originally saw, it is possible to preserve the salient features of the old results while still presenting new information.

Consider as an example Connie’s search. Recall that Figure 1 showed the results returned when Connie first searched for “GPS”. Later, when she re-performed the same query, the results had changed to include several new GPS systems, as shown in Figure 2(a). Instead of directly returning these new results, which could be disorienting, or the original results, which might omit items Connie would want to see, the Re:Search Engine merges the two to produce the results shown in Figure 2(b). The merging preserves memorable aspects of the original results, such as links that have been followed (italicized), anomalous results (“Geological and Planetary Sciences”), and the ordering of the first and last results seen, while also including new results and an updated result summary.

An earlier exploratory paper prototype study [17] of people interacting with lists of document summaries suggests that many changes, such as changes to the summary wording and to the document ordering, go unnoticed, even when the changes occur as the person interacts with the information. The challenge is designing a system that takes advantage of the fact that people don’t notice all changes is to identify which aspect of the information a person interacts are memorable (and thus should only be changed with care), and which are not (and can change as needed).

The Re:Search Engine Architecture

The architecture design of the Re:Search Engine is shown in Figure 3. The design of the system is influenced heavily by a study, described later in this paper, of what 119 people found memorable about search result lists.

The user issues a query to the Re:Search Engine through their browser (either through a toolbar plug-in or a Web page). The query is then passed through an index of past queries that the user has issued. The index returns queries that are similar to the one just issued and a score for each past query that represents how similar it is to the current query. These queries are then used to retrieve from a cache the results the user viewed when searching for each past query. This set of results, along with the live results for the current query from the underlying search engine, are merged together, using the query scores to weight how important each different result set is. The resulting result list of search results is presented to the user. Finally, the query the user issued is added to the index of past queries and the merged result list is added to the result cache. Each of these components is described in greater detail below.

All of the data the Re:Search Engine collects resides locally on the user’s machine. This has the disadvantage of tying the use of the Re:Search Engine to a particular machine, but such a design decision ensures that the relatively large amount of personal information that the Re:Search Engine stores will remain private.

Index of Past Queries

As mentioned earlier, when a user first issues a query to the Re:Search Engine, the query is passed through an index of past queries the user has issued in order to determine if the user intends for the current search to retrieve previously viewed information, and, if so, what queries were previously used to retrieve such information. The index of past queries functions in a similar manner to a traditional document index used in information retrieval, except that the “documents” that are indexed are past query strings.

Each past query (pq) is given a score based on how closely it matches the current query (cq). The score is computed using a standard information retrieval scoring function known as tf.idf [14]:

where N is the number of past queries the user issued, and nt is the number of past queries in which term t occurs. This scoring function reflects the fact that past queries that match due to terms the user searches for rarely are more likely to mean the same thing than terms that match because of terms the user searches for often.

Note that not all queries with similar text are repeat queries. In fact, if a user is in the middle of a search session, it is likely that if she issues several variants of the same query she actively wants to see new results with each variant. For example, if Connie thought the results she received for her query for “GPS” returned too many expensive systems, she might try searching for “GPS, cheap”. This search should not merge in the results for the query issued immediately prior. For this reason, past queries that are similar but that occurred recently are ignored.

Once queries that are similar to the current query have been gathered, the current query is added to the past query index. The query is stemmed and stop words and capitalization are removed. The use of an index to match queries was chosen both because it runs quickly and because it accurately reflected how people remember passed searches based on the study described later in this paper. Exactly what people remember about past queries will be discussed in greater detail.

Result Cache

If the query the user issued is determined to be related to one or more previous searches run by the user, the results corresponding to the previous searches are fetched from a result cache using the pervious queries returned by the past query index. The result cache is a straightforward cache that maps an exact query string to the search results list presented to the user for that query. Only the most recently viewed set of results for a particular query is stored in the cache. For example, when Connie issued the query “GPS” a second time, the results shown in Figure 2(b) replaced the results shown in Figure 1 in her result cache.

User Interaction Cache

Once past results that might be relevant to the user’s current query are fetched, they are merged with the live search results to produce a result list consisting of old and new information to return to the user. Because the merge algorithm is designed to help users take advantage of the context they built on the query topic during past searches, it requires an understanding of how memorable the past results the user interacted with are. Implicit measures of attention, like those discussed by Kelly and Teevan [10], provide some insight as to good cues as to what a user might have paid attention to during their previous searches. Many of the possible cues require the passive collection of data about the user’s interaction with the past search results. These cues collected by instrumenting the user’s browser to observe the user’s interactions with previous result sets. The interactions are stored in a user interaction cache.

Currently the user interaction cache only stores past results that the user clicked on, but there are many other possible implicit cues that could use to understand which results are memorable to the user. Possible cues worth investigating include dwell time on the result’s Web page, the number of times a particular result is accessed, and more sophisticated measures such as mouse tracking or eye tracking. Additionally, active information could be gathered. For example, the system could be extended to allow the user to mark results that they believe are worth remembering.

Merge Algorithm

When the result lists merging occurs, the goal is to preserve what is memorable about the past queries while incorporating relevant information newly available in the current results retrieved from the live search engine. To do this, each old result is given a memorability score based on past user interactions with the results and static information about the result, such as its rank in the result list. The memorability score was developed by studying what is memorable for users and is described in greater detail later in the paper. A quick overview is given in Table 1. The result’s memorability score is based on whether or not it was clicked and its location in the result list. It is then weighted by the query relevance score returned by the index of past queries, since even if a result is generally very memorable, it is not likely to be remembered unless it was encountered during a similar query. Further, the query’s score is discounted by the amount of time that has elapsed since the query was last issued, reflecting the fact that people forget past queries as time elapses. Because each past result list that a result occurs in increases its likelihood of being remembered, results that occur in multiple lists receives a memorability score equal to a weighted sum of their score in each individual list.

Changing the presentation of an old result incurs a cognitive cost for the user, and this is represented by assigning a cost of change to various different types of changes that can be made to a previously viewed result. The cost of change function is summarized in Table 1. For example, changing the order of a search result incurs a small cost, while removing a search result from the search result list incurs a large cost. Like the memorability score, the cost of change is based on actual user behavior, and is discussed further later in the paper. Because changes to results that are remembered well will likely to incur a greater cognitive cost to the user than changes to results that are hardly remembered at all, the cost of a particular result list is based both on the memorability the results and the cost of making the changes to produce the particular result list. Note that even results that do not occur in the list can affect the cost of the result list, because there is a cost associated with not displaying memorable results.

While preserving what is memorable to a user is important, it is also important to provide new relevant information to the user. For this reason, each result in the new result list returned for the current query is given a benefit of new information score based on the expected benefit the result will provide to the user. If scoring information is available from the live search engine, that score can be used to represent the expected benefit. However, because scoring information is often not available, the Re:Search Engine uses the result’s rank as a proxy. Because beneficial results are most likely to be seen if they occur high in the returned result list, the benefit of a result list is based both on each individual result’s benefit and its location in the list. See Table 1 for details. A search result from an old result list receives a benefit score only if it occurs in the new search result list.

When the old result lists are merged with the new result list, all permutations of possible final result lists are considered. The result list with the highest total benefit minus cost times memorability is selected and returned to the user. Although considering all permutations naively is expensive, the merge algorithm can be implemented efficiently in linear time with respect to the number of results being merged by representing the problem as an assignment problem [3].

Understanding What People Remember

Because the Re:Search Engine is designed to help people interact better with dynamic search results, designing the system required an understanding of what people actually remember about the search results they receive when searching. For this reason, a study was conducted in which participants were initially asked to interact with a search result list and then later asked to recall the information with which they interacted.

To understand as naturalistic a search behavior as possible, the initial query was self-selected by the participant to be a query of interest, and issued to an Internet search engine via whatever method the participant usually uses to access the Web. Users were presented with between 8 and 12 search results for their query, and asked to interact with the results as they normally would. The results that the participants clicked on during this period were logged.

After an interval of a half hour to an hour, participants in the study were emailed a brief survey, a portion of which is shown in Figure 4, asking them to remember the results the received earlier without referring back to them. The survey asked the participant to remember the query they originally entered, the number of results the search engine returned, and basic information about each the search result in the list, including its title, snippet, URL, whether the result was clicked on, and, if so, what the Web page was like.

Approximately half of the 245 people who issued an initial query completed the follow-up survey. Removing survey responses that were clearly erroneous (e.g., the remembered query and results didn’t match the initial query and results at all, or the titles, snippets and URLs were remembered exactly, suggesting the participant referred back to the result list while completing the survey) left 119 responses for analysis. A relatively even number of men (52%) and women (45%) responded to the follow-up survey. Most (64%) were between the ages of 25 and 39, but 18% were over 40, and 15% under 25. Ninety-seven percent reported using a computer daily. Only 27% of respondents were affiliated with MIT. Typically, the follow-up survey was completed within a couple of hours of the initial search. Sixty-nine percent of all responses were received within three hours of the initial search, and all but five were received within a day.

Much of the search behavior observed in the study was similar to behavior commonly observed with Web search. The average initial query length was 3.2 words, comparable to what has been found by others through Web query log analysis [15, 20]. When interacting with search results, participants on average clicked on 1.9 results, and this number is comparable to the 1.5 clicks per result page observed by others on considerably larger data sets [20].

The data collected through this study was analyzed to provide insight into how to identify repeated queries, how to recognize memorable results, and how to understand the relative cost of various different types of changes that could be made to a result list.

Identifying a Query as a Repeated Search

Identifying when a person has expectations based on past experience about the result set returned for a query can be difficult because when a person repeats a past search, the query issued is not necessarily the same as the initial query. For example, although Connie’s initial search was for “GPS”, she might at a later date use the query “GPS system” or even “Global Positioning System” in an attempt to find the same information. The Re:Search Engine is designed to include a broad concept of query matching, recognizing query strings that overlap or produce similar result sets as possible repeat searches. The level of certainty the system has that a search is a re-search is used to weight how much of the original search result information is preserved in the result ordering presented to the user.

The Types of Changes People Made

Even though the time that elapsed between when participants in the study initially entered a query and when they were asked to remember it was relatively short, the original query was misremembered by 28% of the participants. A majority of the differences between the original query and the remembered query were straightforward, and fell under the following categories (the percentage of misremembered queries with the error being discussed is listed in parentheses):

Capitalization (31%). A common change between the initial query and the remembered query was that the words in one query would be capitalized differently than they were in the other query. For example, the query “Buddha belly” became “Buddha Belly” an hour later.

Word form (28%). Another common change observed was in word form. For example, a query term that was originally listed as plural might be remembered as singular, as in the case where “sample television scripts” became “sample television script”.

Word ordering (28%). The order individual terms occurred in a query also often changed. For example, the query “porsche 356” was remembered as “356 Porsche”.

Phrasing (31%). The helper words used to place the primary query terms in context often varied. An example of this is that one participant originally queried, “I’m looking for a Burberry Scarf” but remembered the query as “Where can I find Burberry Scarves?” Unimportant terms like “for” and “where” are commonly referred to as stop words.

Accounting for Query Errors in the Re:Search Engine

All of the above described differences are easily accounted for in the Re:Search Engine’s past query index. Word ordering does not matter in an index, meaning changes in word ordering do not affect a query’s similarity to past queries. Further, at index time, query terms are stemmed and converted to lower case. This avoids missing matches due to changes in capitalization and word form. Stop words are not indexed to deal with changes in phrasing.

Other observed differences were more complex. For example, longer queries were more likely to change over time. The average length of a correctly remembered query was 2.4 words, compared with an average length of 5.2 words for misremembered queries. Further, queries, when remembered, were often shorter than their original (e.g., “whats the best pricing available for a Honda Pilot or Accura MDX ?” became “best pricing for Accura MDX”). If the original query and the remembered query were different, the remembered query was, on average, a whole word shorter than the actual query. The use of an index allows for differences in length to be handled gracefully. Reissued queries are not penalized for being shorter than the corresponding original query. Further, longer queries are more likely to receive a higher score, even if they are slightly different from the original query, reflecting the fact that matches between long queries are less likely to be mere coincidence.

Even richer differences could be captured through a more semantically rich index, but are not accounted for at the moment. It was observed that all queries included at least some overlap in terms, and for now, the Re:Search Engine matches queries based only on the query text. However, the information collected in the study about repeated queries is based on how people remember queries repeated within the same day, and often within the same hour. It is likely that with a longer elapsed time between initial query and follow-up query more drastic changes will occur than observed.

One way to address large changes in query would be to match passed not just query terms but also on the query’s result set. It is a simple extension to the Re:Search Engine’s current design to include matching the titles and snippets from results in the new result set with titles and snipes of results in past results sets. Sophisticated query matching like this is similar to work by Wen, et al. [19], where the authors investigate, in the context of log analysis, considering queries as the same if they contain the same term or lead to the same documents.

Care should be taken, however, not to over-match queries. Some of the participants’ queries were observed to clearly evolve as a function of what was learned during the initial query. For example, the query “Edward V of England” became “King Edward IV”, presumably because the searcher figured out during her search that she really wanted information on the earlier king.

The Re:Search Engine could help support recognition of past queries rather than require recall. In the study presented in this paper, queries were recalled without the help of query completion and form auto-fill-in was intentionally disabled. However, to support recognition, the Re:Search Engine could present its user with a list of the related past queries returned by the past query index when it presents the search results. Support for recognition of past queries could also be provided while the query is actively being issued through some sort of automatic form fill-in. The matches suggested could be made to be more sophisticated than those currently suggested by form auto-fill-in by accounting for the observed changes in word order, capitalization and helper words. However support for recognizing past queries is implemented, when a user actively selects a past query instead of issuing a new query, it will provide added certainty as to the proper query match and to the fact that the user is interested in using past information.

Calculating memorability

In addition to providing insight into how queries change when remembered, the study also provided insight into how to measure how memorable a result is. Recall that old search results are given a memorability score that is used to determine how likely the user is to remember them. It was observed that most results were not remembered at all. On average, participants only described 2.1 of the 8 to 12 results with which they interacted. They were most likely to describe some aspect of the result summary (67% of the time a result was described, the description included information about its summary), followed by its title (57% of the time it included information about its title), whether it was clicked on (51%), and the Web page (48%). They were least likely to remember anything about the URL (41%).

Because in the participant’s descriptions of the remembered results the exact rank of the result might be misremembered, it was necessary for analysis to match the remembered result with the actual result. This matching was often straight forward to do, particularly when some aspect of the URL was remembered as the URLs tended to be fairly unique identifiers. The record of which results each participant had clicked on also served as a useful guide if something was remembered about the Web page associated with the result or the “I clicked on this result” checkbox was marked. However, some results were impossible to match. For example, one result merely had a summary description of “something shakespeare” when the query was “shakespeare sites”, and all results in the result list were about something Shakespeare. Fifty-nine descriptions were discarded because they could not be easily matched, and the remaining 206 memorable results were analyzed.

Memorability as a Function of Click Through

Two main factors emerged from the data as appearing to affect a result’s memorability: whether it was clicked on and where in the result list it was ranked. A result was significantly more likely to have something remembered about it if the user clicked on it initially. Although only 19% of the results displayed to participants were clicked on, 54% of the results that were remembered had been clicked. The large number of results that were clicked on can be seen graphically in Figure 5.

Memorability as a Function of Rank

Results were also memorable as a function of how highly they were ranked in the result list. Ordering effects are well known in psychology literature to have an affect on memory [2, 8, 12], and the idea that the first items in a list are well remembered is called the primacy effect [12]. In the case of search results, the primacy effect is probably exaggerated because the top ranking results also get significantly the most attention than lower ranked results. Figure 5 shows how many results had something remembered about them as a function of rank.

There is also a bump in both the “Clicked’ and “Not Clicked” curves in Figure 5 showing an increase in a result’s memorability if it occurs near the end of the result list. This suggests that results near the end are also more likely to be remembered. Like the primacy effect, this phenomenon has been commonly observed in psychology literature [8]. In addition to boosting the memorability of highly ranked results, the Re:Search Engine boosts the memorability of results that occur near the end.

Other Factors that Affect Memorability

A common comment that accompanied results about which a lot of information was remembered was that the result was a Web page that the participant visited often. For example, one participant remembered the exact URL for a flamenco Web site, and her ability to remember it is explained in her description of the site, where she said, “good flamenco website I have used before.” The importance of repeat visits suggests that repeated visits are very tied to memorability, and this is a logical additional piece of information for the Re:Search Engine to incorporate in it’s model of memorability, although it currently is not.

Interestingly, sometimes results were remembered that did not occur in the original result list. For example, a participant remembered a result from the domain “” as appearing in his results for the query “pittsburgh pirates”. However, no such result occurred. He probably remembered a Web page that he saw via some other means as occurring in the result set because conceptually it seemed to belong to the same set. The Re:Search Engine might actually help deal with false memories because it collects different results from multiple queries, as well as new results that didn’t occur in previous results sets but that the user might expect to have seen there.

Although the study only looked at what people remembered over a short period of time, what a person remembers when they re-issue a search is likely to be dependent on the elapsed time since the initial search. The Re:Search Engine accounts for, and even takes advantage of, this. Like a person, the Re:Search Engine “forgets” about previous searches over time by decreasing the memorability score as time passes. This enables the Re:Search Engine to present more new information for the same query as more time elapses. In the unlikely event that storage space becomes an issue for the Re:Search Engine, the fact that people forget information over time could also be used to relieve the storage burden of having to store all of the search results ever seen by a user.

Calculating the Cost of Change

Analysis of the differences between what was remembered about a search result and what the result actually looked like provides insight into the cost of changing different aspects of a result. There are three types of change that a result in the result list a participant originally saw could undergo when the search was repeated: it could experience a change in description, a change in rank, or a change in appearance. What the study revealed about the cost of these three types of change is summarized in Table 1, and discussed in further detail below.

Change in Description

Often, very little was remembered about the title and summary of a result, particularly in terms of specific details. The most common remarks related to how the result related to the participant’s query. Results that were in unusual languages (“entirely in French”) or had unusual file types (“pdf”) had those aspects mentioned regularly in the result description. This suggests that non-standard aspects of results were particularly memorable and although they are not currently accounted for, it might be worth placing special emphasis on not changing such features.

The source of the result page also appeared to be an important aspect of the description. The domain associated with the result’s URL was the most common thing to be remembered about the URL. Within the Re:Search Engine, URLs are used to uniquely identify a result, meaning the results for similar queries with exactly the same content but a different URL were treated as the same result. Given the importance of source observed in the study, this seems to be a reasonable design decision.

In the Re:Search Engine, the cost of changing the title or summary was set to be lower than the benefit of displaying a new title and summary. In practice, this means that if a new title and snippet for the same result occurs in the result list returned by the live search engine, it replaces the old title and snippet. Otherwise, the title and snippet used for a result is the title and snippet for that result in the result list associated with the query that received the highest score from the past query index.

Change in Rank

Investigating how well a participant remembered the rank of a result provides insight into how costly a change in rank is. Mistakes in remembering a result’s rank occurred often (64% of the time). Accordingly, a change in rank does not incur a large cost of change in the Re:Search Engine.

Although mistakes were common, if a result was ranked first in the result list the participant initially saw, it was very unlikely the participant would misremember its rank. In these instances participants were correct about the result’s rank 97% of the time. This accuracy dropped off quickly as the rank dropped, as can be seen graphically in Figure 6. This figure graphs a result’s actual location in the result list displayed to the participant compared with the result’s remembered location. The size of each point represents the number of participants who remembered that specific actual location-remembered location combination. As can be seen, for high ranking results, most of the weight occurs along the identity line, but as the rank drops, so does the correctness of the participants’ memories. Thus moving a result from the number one position in a result list should incur a considerably higher penalty than moving a lower ranked result. However, because highly ranked documents are also more memorable, the cost of changing the rank of a result is set to be only a function of how large the change is.

Figure 6 also illustrates another trend in the data. The weight of the data occurs to the right of the identity line. Remembered results were much more likely to be remembered as having been ranked higher than they actually were. Results moved up in the participants’ memories 25% of the time, and moved down only 11% of the time. This is probably because remembered results were more likely to be relevant to the participant’s information need and thus “should have been” ranked more highly than they were. In the Re:Search Engine, moves up in the result list cost less than moves down.

Swaps in result ordering occurred in 13% of the responses where a swap was possible (i.e., a participant’s remembered results mapped to at least two real responses). Although some psychology literature suggests that relative ordering is important for recall [8], because swaps did occur in the data and because allowing for dependencies among search results increases the complexity of the merge algorithm, and the Re:Search Engine does not penalize ordering swaps.

Change in Appearance

The third type of change a result could experience is that it could be removed from the result list. A result that fails to appear but that is memorable incurs a higher cost to the user than a result that is not memorable. Because the memorability score is intended to represent how likely the result is to be remembered, the Re:Search Engine implements a constant cost to a result for failing to appear when it once did. The cost of not occurring at all is set considerably higher than a change in ordering or a change in description.

Other Measures of Cost and Memorability

There were many other trends in the data that suggest more sophisticated measures of memorability and cost of change that have not currently been accounted for in the design of the Re:Search Engine. For example, number of results presented to the user appears to have been relatively unmemorable. Although only 16 of the participants received result sets that contained exactly 10 results, 59 of the participants remembered receiving 10 results. People clearly have a strong expectation as to the number of results in a result set, but do not notice small variations in size. Thus it might be possible for the Re:Search Engine to present more results when there are a large number of memorable results to display by additionally assigning a cost to presenting more or less results than the expected ten.

Generalizing the Solution

Although the Re:Search Engine focuses on maintaining consistency between old search result lists and new, other types of information people commonly interact with also change. For example, online news sites change when new stories are written, and Web sites change as their hosts edit them. The growing ease of electronic communication and collaboration, the rising availability of time dependent information, and even the introduction of automated agents, suggest information is becoming ever more dynamic. As stated by Levy, “[P]art of the social and technical work in the decades ahead will be to figure out how to provide the appropriate measure of fixity in the digital domain [11].” Understanding how people interact with search results on the Web while re-finding will shed light on how people return to information in dynamic environments in general, and we look forward to applying what we learn from the Re:Search Engine to other problems in the domain.

ACKNOWLEDGMENTS

This work is supported by the NTT-MIT Alliance, the Oxygen Project, and the HP-MIT Alliance. We are grateful to Mark S. Ackerman, Susan T. Dumais and Robert C. Miller for their thoughtful advice and to Pawan Deshpande for his help implementing the Re:Search Engine.

REFERENCES

1. A9,

2. Asch, S.E. Forming Impressions of Personality. Journal of Abnormal and Social Psychology, 41 (1946), 1230–1240.

3. Ahuja, R.K., Magnanti, T.L. and Orlin, J.B. Network Flows: Theory, Algorithms, and Applications, Prentice Hall, New Jersey, 1993.

4. Capra, R.G. and Pérez-Quiñones, M.A. Re-Finding Found Things: An Exploratory Study of How Users Re-Find Information. Tehcnical Report cs.HC/0310011, Computing Research Repository (CoRR), 2003.

5. Graphic, Visualization, and Usability Center. GVU’s Tenth WWW User Survey, October 1998.

6. Dumais, S.T., Cutrell, E., Cadiz, J.J., Jancke, G., Sarin, R. and Robbins, D.C. Stuff I’ve Seen: A System for Personal Information Retrieval and Re-Use. In Proceedings of SIGIR’03, 2003, 72–97.

7. Hayaski, K., Normura, T., Hazama, T., Takeoka, M., Hashimoto, S. and Grudmundson, S.. Temporally Threaded Workspace: A Model for Providing Activity-Based Perspectives on Document Spaces. In Proceedings of HyperText’98, 1998, pp. 87–96.

8. Henson, R. Short-Term Memory for Serial Order:The Start-End Model. Cognitive Psychology, 36 (1998), 73–137.

9. Jones, W., Bruce, H. and Dumais, S.T. How Do People Get Back to Information on the Web? How Can They Do It Better? In Proceedings of INTERACT’03, 2003, pp. 793–796.

10. Kelly, D. and Teevan, J. Implicit Feedback for Inferring User Preference: A Bibliography. SIGIR Forum, 37, 2 (Fall 2003), 18–28.

11. Levy, D. Fixed or Fluid? Document Stability and New Media. In Proceedings of European Conference on Hypertext, 1994, pp. 24–31.

12. Murdock, B.B. The Serial Position Effect of Free Recall. Journal of Experimental Psychology, 64 (1962), 482–488.

13. Rekimoto, J. Time-Machine Computing: A Time-Centric Approach for the Information Environment. In Proceedings of UIST’99, 1999, pp. 45–54.

14. Salton, G. Automatic Text Indexing Using Complex Identifiers. In Proceedings of the ACM conference on Document processing systems, 1998, p. 135–144.

15. Spink, A., Wolfram, D., Jansen, B.J. and Saracevic, T. Searching the Web: The Public and Their Queries. Journal of the American Society for Information Science, 52, 3 (2001), 226–234.

16. Teevan, J., Alvarado, C., Ackerman, M.S. and Karger, D.R. The Perfect Search Engine is Not Enough: A Study of Orienteering Behavior in Directed Search. In Proceedings of CHI’04, 2004, pp. 415–422.

17. Teevan, J. Displaying Dynamic Information. In Proceedings of CHI’01 (Extended Abstract), 2001, pp. 417–418.

18. Teevan, J. How People Re-Find Information when the Web Changes. MIT AI Memo AIM-2004-012.

19. Wen, J.-R., Nie, J.-Y. and Zhang, H.-J. Clustering User Queries of a Search Engine. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, 2002.

20. Xue, G.-R., Zeng, H.-J., Chen, Z., Yu, Y., Ma, W.-Y, Zi, W. and Fan, W. Optimizing Web Search Using Web Click-Through Data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, 2004, pp. 118–126.

-----------------------

[pic] [pic]

(a) Current Web results (b) Results show to user by Re:Search Engine

Figure 2: An example of the Re:Search Engine in action. Figure 1 shows the search results when Connie first searched for “GPS” (visited links are italicized). Figure 2(a) shows the results when the query is next performed, and Figure 2(b) shows how the Re:Search Engine combines what Connie is likely to remember from Figure 1 with what is new in Figure 2(a).

[pic]

Figure 6: The location in the list as the participant remembered it, compared with its actual location. The size of the point represents the number of people remembering that combination.

[pic]

Figure 5: The number of results where something was remembered about the result goes down as a function of rank. Clicked on results were more likely to have something remembered about them.

Scorepq = "(t in cq) pqt log(N/nt)

[pic]

Figure 4: A follow-up survey that asked particrepq = ∑(t in cq) pqt log(N/nt)

[pic]

Figure 4: A follow-up survey that asked participants what they remembered about the previously viewed search results.

[pic]

Figure 3: The architecture of the Re:Search Engine. The user’s current query is matched to past queries, and the results for the past queries are retrieved from a cache. These results are then merged with the live search engine results based on how memorable the results are, and the resulting result list is presented to the user.

[pic]

Figure 1: Connie’s initial results for the query “GPS”.

[pic]

Figure 6: The location in the list as the participant remembered it, compared with its actual location. The size of the point represents the number of people remembering that combination.

|Benefit of New Information | |

| |0 if result is not in new result set |

| |+ (11-ri) x (11-rn) |

| |– [1 if result description changes] |

|Memorability | |

| |0 if result is not in an old result set |

| |+ [0.5 if result was clicked] |

| |+ 0.5 x 2ri / 210 |

| |+ 0.1 x 211-ri / 210 |

| |x query score |

| |x 1/log(elapsed hours)) |

|Cost of Change | |

| |20 if result is removed from result list |

| |+ [0.1 if result description changes] |

| |+ abs(ri-rn) |

| |+ [2 if ri ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Related searches