Visualizing Someplace to Eat: A Comparative Experiment of ...



Visualizing Someplace to Eat: A Comparative Experiment of Three Interfaces for Searching a Restaurant Database

IS 271:

Quantitative Research Methods for

Information Management

Fall, 2000

Instructor: Rashmi Sinha

Kim Garrett

Sacha Pearson

Jennifer English

Abstract 3

Motivation for the Study 3

Previous Research 3

Preparation for the Experiment 3

User Needs Assessment 3

Applying our findings to the Experiment 3

Experimental Method 3

Introduction and Hypothesis 3

Dependent Variables 3

Confounding Variables 3

Equipment 3

Tasks 3

Materials 3

Participants 3

Pre Experiment Procedures 3

Training Procedures 3

Pilot Study Results 3

Changes to the Interfaces 3

Changes to the Test 3

Changes to the Instruments 3

Qualitative Results 3

Post Interface and Made-up Task Questionnaires 3

General Feedback ~ Post Test Questionnaire 3

Quantitative Results 3

Testers 3

Results 3

Design Recommendations 3

Attribute Explorer 3

Dynamic Query 3

Proposed Implementation 3

Conclusions 3

Changes to Experiment for Future Iterations 3

Acknowledgements 3

References 3

Appendix 3

Appendix A: Problems with Visual Attribute Explorer 3

Appendix B: Instructions for Using the Attribute Explorer (see attached) 3

Appendix C: Tasks (see attached) 3

Appendix D: Test Key (see attached) 3

Appendix E: Sample Metrics Sheet (see attached) 3

Appendix F: Script and setup instructions (see attached) 3

Appendix G: Sample Pre Test Questionnaire (see attached) 3

Appendix H: Sample Dynamic Query Tasks and Questions (see attached) 3

Appendix I: Sample Attribute Explorer Tasks and Questions (see attached) 3

Appendix J: Sample Forms Tasks and Questions (see attached) 3

Appendix K: Sample Made Up Tasks and Questions (see attached) 3

Appendix L: Sample Post Test Questionnaire (see attached) 3

Appendix M: Old Instruments (see attached) 3

Abstract

Traditional, form-based queries do not give the user any information about what the database can offer them and how their query constraints limit the data. When user goals are relatively fluid, it is helpful to have information about how loosening constraints will affect the returned set. For instance, if spending $5 per night more on a hotel room means twice as many hotels to choose from, the user would like to have that information. It could be that using techniques which provide immediate feedback will allow the user to see how their constraints limit the data and will help them to make more informed choices in their query specifications.

In this experiment, we compared an interface that uses visualization to two forms-based interfaces, one flat and one that dynamically shows query results. Tasks had to do with choosing a set of restaurants. We wanted to find out how the interfaces compared in terms of task completion time, quality of query results and user confidence. In addition, we collected user satisfaction information.

We also wanted to determine whether textual training had any impact on success with the visualization interface.

Motivation for the Study

This experiment was conducted as part of a larger graduation project at the School of Information Management and Systems.

TraveLite is a web-based, customized travel guide publisher. It allows travelers to sort through a database of travel content and choose only what they decide they need or want. Through a series of tasks, users create a customized guide, which they can later download to a PDA or other portable format. In creating guides based on their interests and needs, travelers will have the opportunity to purchase their guide, rather than a static, bland product designed for a generalized perception of what a generic traveler in a region may need. One of the major design hurdles for the project, however, is how to support user queries over the vast amount of travel information that is available.

One of the primary tasks we will need to support in the prototype of TraveLite is building a guide online using a web-based interface. In order to build a guide, users will need to use some type of tool to sort through the large amounts of nominal and ordinal data available, and filter out the elements of interest specific to their needs. The search/filter task can be daunting given a large database of content, and the possibility for failed queries (0 hits) is high as more constraints are added to queries.

Traditional, form-based queries do not give the user any information about what the database can offer them and how their query constraints limit the data. When user goals are relatively fluid, it is helpful to have information about how loosening constraints will affect the returned set. For instance, if spending $5 per night more on a hotel room means twice as many hotels to choose from, the user would like to have this information. It could be that using techniques that provide immediate feedback will allow the user to see how their constraints limit the data and will help them to make more informed decisions regarding what to collect for their customized guides.

We expect that a visualization tool such as the Visual Attribute Explorer will enable users to interact with the database in an intuitive manner that facilitates exploring, searching and selecting sets of data based on attributes relevant to the individual's travel needs.

In order to determine the appropriateness of using the Visual Attribute Explorer, we conducted an experiment to evaluate the comparative usability of the Visual Attribute Explorer. We compared the tool to a traditional, form-based query interface, as well as a dynamic query interface. The task was choosing a set of restaurants to include in a guide.

In the final prototype, the Visual Attribute Explorer would have to be redesigned to function in a web interface. Before investing time in such a redesign, we wanted to determine whether such an interface would help or hinder users in their tasks.

In addition, since our eventual audience will be Internet users, we sought to make the experiment as close to the conditions our eventual users would face as possible. For example, the training we provided to some of the users was text based rather than provided by a live teacher.

Previous Research

As the amount of information we deal with on a daily basis increases, we need easier ways to manage and filter that information. Visualization tools are a way to represent data that takes advantage of our visual cognitive skills. Humans can recognize and understand shapes and colors much faster than we can process text. Furthermore, dynamic query and visualization tools allow the user to manipulate datasets through a graphical interface. These systems all incorporate:

▪ Rapid incremental and reversible operations with immediate visual feedback on each action

▪ Smooth graphical feedback of results

▪ Continual visual representation of the dataset

▪ Physical based interaction with the data, usually using sliders or buttons, to allow the user to form and develop queries

▪ Further details on demand

▪ A layered approach to learning that allows both naïve and experienced users to use the tool

▪ Eliminate the zero hits returned problem. If zero hits occur, user simply sets the results back to the previous stage [1]

A main benefit of these systems is that they enable the user to reduce the found set to a manageable size (based on desired attributes/constraints) and then allow for deeper exploration. Furthermore, the sense of actual control over the data and query process bolsters user confidence in the results of the search. [2]

Much research has been performed on visualization tools using these dynamic query principles. We will summarize a few of the most closely related below. First we will look at two systems that allow exploration of travel related information. FareBrowser is an interactive tool for finding and comparing airfares with a visual element displaying results of the query. The Restaurant Finder is a dynamic query system that allows users to explore the metadata and preview the refined data set.

In the FareBrowser study, the authors developed a tool for searching for airfares using a visual display of the results. The FareBrowser, like TraveLite, was developed with the goal of allowing users to access the system via the Internet and therefore needed to be simple enough for the average user to learn and use but also handle complex queries based on many specific constraints. The user can change constraints, including destination and flight type, to locate those flights that best match their needs. The system also enables the user to manipulate the time scale using sliders. The study compared FareBrowser with Travelocity's FareFinder, a text based search tool. The results of the experiment showed that non-technical users felt that the FareBrowser was too complex and relied too heavily on the users ability to interpret the graphs, and was thus too complicated to learn in comparison to the task. Those with a more technical background appreciated the ability to see all details in one screen and to have direct control over the data. Experienced users were also able to complete the more complicated tasks in less time with FareBrowser. [3]

In this second example, Catherine Plaisant, et al. were interested in using a dynamic query and query preview to explore very large datasets over slow network systems. The system allows the user to explore the dataset over the metadata, combining browsing and querying, until the user arrives at a more usable set of data that they can then request and explore in depth. The authors developed the Restaurant Finder, a search interface for EOSDIS (NASA's Earth Observing Data Information System), and a film database for exploring the possibilities behind this type of system. They performed a user experiment on the film database and found that users on the query preview system were twice as fast in searching for items than those using a form fill-in interface. The query preview system also performed higher on user satisfaction. [4]

Octavio Juarez evaluated the performance of visualization tools based on the task they are intended to support. The experiment was a between subjects test using two tools: one using an interactive table system and one using an interactive visualization. Both groups looked at a set of environmental data and the experiment captured two measures: time to perform the task and the quality of the results. The researcher discovered that most users only needed to look at the data on a summary level. Unless they actually needed the more complex data feedback, the visualization tool was too complex and unfamiliar to the users for exploring the data at this level. However, users that did work with the visualization tool were more efficient in completing the tasks. [5]

The HomeFinder employs dynamic query and direct manipulation with immediate feedback to enable users to rapidly explore real estate listings. The system uses sliders to change the range of the dataset based on several criteria. To submit a query, the user changes a range on the sliders and the system automatically updates the found set. The tool facilitates multiple and reversible queries, there are no errors because the user can return to a previous found set. [6] Christopher Williamson and Ben Shneiderman performed a study comparing the Dynamic HomeFinder with two other interfaces, a natural language query system and traditional paper listings. The experiment measured time to find the correct answers on a series of increasingly complex tasks and the user's subjective satisfaction in using the interface. The results of the experiment showed that the dynamic query interface provided the best search results overall and also scored highest on satisfaction. [7]

The Alphaslider is the interface element employed by all of these visualization tools to quickly and graphically move the user through a data set. In this study, the authors looked at evaluating different designs of sliders for selecting text from a list using RSVP (rapid, serial, visual presentation of text) in dynamic query systems. The experiment compared four designs in a controlled environment where they compared the interfaces by looking at time to locate an item and subject satisfaction. Each interface employed a different method of allowing the user to indicate desired granularity of a search. [8]

Comparative evaluations of interfaces using visualization seem to fall into two general categories: either the interface is compared to old technology and the interface using the visualization interface is clearly superior (as in the case of the Homefinder experiment) or they were compared to technology better suited to that task, in which case the visualization is preferred by expert users but though too complicated by novice users. In comparing three interfaces, one which does not provide pre or post query information about data density across attributes, one which provides post but not pre information, and one which provides pre and post query information, we can elicit which parts of the visualization paradigm are useful and which are overwhelming to users.

Preparation for the Experiment

User Needs Assessment

In order to determine user needs for the task of building a customized travel guide online, we held a focus group. We invited eight experienced travelers who use a guide when traveling. Six were graduate students from the UC Berkeley Computer Science Department and School of Information Management and Systems, two were professional travel writers.

All of the participants travel at least once a year to locales that are unfamiliar. All the participants use the Internet and other technical tools prior to and during travel to research, purchase and communicate. This group roughly approximates the target customer of TraveLite - web savvy, frequent travelers who are accustomed to researching and purchasing travel related information and services from the Web.

While the focus group covered a wide range of topics, we focus here on the information gathered specifically for the experiment. The most important aspect of the product to determine up front was whether users wanted to be able to filter information before they traveled. As we expected, they liked the idea of a web interface that allowed them to eliminate content they knew they would not use (i.e. hotels in a price range they could not afford, restaurants in cities they would not visit, etc.). Given that users would like to be able to choose a group of restaurants to include in their guides, we explored further to determine what sort of metadata are important to them in choosing restaurants.

At the end of the focus group session, we asked users to rate the features they had brainstormed in order of importance to them. This analysis helped us to design the tasks for the experiment which is the focus of our analysis. The results were as follows:

Dimensions:

▪ Price (Categories)

▪ Cuisine

▪ Location (Neighborhood)

▪ Food Rating

▪ Smoking/non

▪ Credit cards accepted

We also researched restaurant listings already in existence to see what attributes they allowed users to search over. From these we gleaned several more attributes for our test interfaces, including:

▪ Open Sunday

▪ Open Monday

▪ Meals served (breakfast, lunch, dinner, etc.)

▪ Noise level

▪ Appropriate Attire

▪ Service Rating

▪ Open after 10

Some attributes which are specific to the TraveLite product were included as well:

▪ Content provider (because TraveLite will aggregate content from many providers)

▪ When updated (to inform users about the currency of the information).

Applying our findings to the Experiment

Part of the goal in using a visualization tool to present data such as our is that users can quickly process and understand the meaning of their search in the context of the overall data.

After considering the results from this focus group session and how to apply them to choosing attributes for the experiment, we realized that there is a distinction between the metadata you will want to search over and the information you want to know about a restaurant. This will also differ depending on the destination and type of trip for which the user is planning. Therefore we realized that, in the eventual implementation, we will need to allow the user to choose which metadata to search over. This is particularly important when considering how to apportion screen space. In order to be readable, these visualizations need to be of a certain size and therefore need to be limited in the amount employed in the system.

Experimental Method

Introduction and Hypothesis

Is visualization useful for the task of selecting content to include in a customized travel guide, specifically for searching over a database of restaurant information?

We compared an alpha version of the Visual Attribute Explorer (available from IBM), to two versions of form based queries, one flat and one with dynamic feedback. Users performed three tasks across the same data set of restaurant information using three different user interfaces for filtering the data. Each task required the user to interact with three, six or nine attributes.

Database content was taken from Lonely Planet and Zagats guides for San Francisco. We also collected qualitative information about a "made up task," which required users to make up a query. We recorded what changes they made to their queries with the feedback from each interface. In addition, we collected subjective information about user satisfaction with the interface following each interface session and following the entire test.

We used a 1 x 2 x 3 design, combining between and within subjects conditions. The levels for the independent variable Interface for the within subjects design were the three interfaces (Forms, Dynamic Query, Visual Attribute Explorer). The levels for the independent variable Training for the between subjects design were the presence or absence of text-based training. The dependent variables were composed of a series of measures outlined below. The table summarizes the design.

|  |  |Interface (within subjects independent variable) |

|Training: (between subjects |  |Forms-Based Interface |Dynamic Queries |Attribute Explorer |

|independent variable) | | | | |

|Training |Dependent Variables |

|  | |

| |Time to completion |  |  |  |

| |Quality of results |  |  |  |

|No training |User satisfaction |  |  |  |

| | | | | |

Dependent Variables

Time:

Measure 1: Task completion time

This will allow us to determine which interface allowed the user to complete their task in the least amount of time.

Recording Method

We started timing each task when the interface had been cleared and was ready for the query to be specified. We stopped timing when the user said they were ready to print their results page.

Measure 2: Exploration Time

In addition to task completion time, we recorded Exploration Time for the time users took to explore the Attribute Explorer interface prior to beginning their first task. We wanted to allow users to "poke around" the interface until they felt comfortable with it. Recording the variable served two purposes: first, we wanted to know how long it took people to feel comfortable enough with the interface before they began their first task, second, we wanted to know if success ( as measured by task completion time, recall, precision and confidence) was correlated with having spent more time exploring the interface.

Recording Method

We started timing as soon as the user was shown the interface. We stopped timing when the user indicated they were ready to begin their first task.

Quality of results:

We collected Recall and Precision as a measure of result quality. We expect these rates to be high if users understand the interface. Since the user task is outlined explicitly, there is not much chance for error in formulating the query incorrectly. If, however, a user does not understand how to specify a query with the interface, these measures should be significantly affected.

Measure 1: Recall

This measurement is commonly used to evaluate information retrieval systems. It shows how successful the system is at retrieving all the information that is relevant to a specific query. Since we can calculate what this measure should be ahead of time, it will be easy to compare a tester's query Recall to ideal Recall. (Recall = Retrieved Relevant Listings/All Listings Relevant to the Query Task). Example: Ratio of restaurants found to restaurants available for the specific filtering task.

Measure 2: Precision

This measurement is commonly used to evaluate information retrieval systems. It is a measure of how successful the system is at retrieving only the information that is relevant to a specific query. Since we can calculate what this measure should be ahead of time, it will be easy to compare a given tester's query Precision to ideal Precision. (Precision = Relevant Retrieved Listings/All Listings Retrieved). Example: Percentage of restaurants found that meet all the criteria.

Recording Method

For each task, we recorded the found set from the query specification. For the forms interfaces, we printed the results page after each task. For the Visual Attribute Explorer, we took screen shots after each task to record the query specification (in the Charts view) and the results page (in the Details view). From these results, we were able to calculate the Recall and Precision (comparing the results to the perfect found set).

User satisfaction with the tool:

Measure 1: Confidence

Recording Method

We asked the user to rate their confidence in the returned results after each task on each interface. The question was on the task sheet for the user to fill in.

Measure 2: Ease of understanding

We asked the user to rate whether they found each interface easy or hard on a 7-point scale.

Measure 3: Usefulness

We asked the user to rate the usefulness of each interface on a 7-point scale.

Measure 4: Did the interface give expected results

We asked the user to rate whether the interface gave them the results they expected on a 7-point scale.

Recording Method

These questions were part of a post test questionnaire for each tester.

Confounding Variables

We expected that the following variables could affect the results of the experiment. The table outlines the possible effects and the measures we took to control for the variables.

|Variable |Possible Effect |Method of Control |

|1. User familiarity with interfaces similar to |Users might be more comfortable with forms based |Collect information about experience in |

|those in the test. Users will most likely be more |interface |the pretest survey |

|familiar with forms-based interfaces than with the| | |

|Attribute Explorer | | |

|2. Reading comprehension |Could slow interaction with interfaces - probably|Testers will be from a university setting|

| |would affect both the same way |- reading skill should be high |

|3. Familiarity with databases and querying |Those who have used database in the past will |Collect information about experience in |

| |have a better cognitive model for the tool - |the pretest survey |

| |might affect each equally | |

|4.Task order |Learning effects |Task order was varied among the |

| | |participants |

|5.Observer presence during testing |Could make users nervous, self-conscious |All tests will be conducted with observer|

| | |present and reading from a script, so |

| | |this will be held constant for the three |

| | |interfaces |

Equipment

Tests were conducted on Windows NT 4.0 workstations with 21" monitors set at 1280 x 1024 pixels in 24-bit color. The browser used for the web-based form interfaces was Microsoft Internet Explorer version 5.0.

One of the interfaces used was a single-page form interface built with Cold Fusion and pulling from a flat file in Microsoft Access. Another was a dynamic query-style interface built with Cold Fusion and pulling data from the same flat file in Microsoft Access. The third interface was a Java program available for free download called Visual Attribute Explorer. This visualization software pulled its data from a flat file exported from Microsoft Access. The file was exported from the same flat table accessed by the other two interfaces for the purposes of the experiment.

The attributes in the Attribute Explorer were displayed in the same order in which they were displayed in the other two interfaces. The screens were set up so that the charts view appeared in the left part of the screen (filling all but the rightmost three inches of the monitor). The detail view of the restaurant data was displayed in the remaining three inches of the monitor screen. The restaurant names were moved to the left most column so that users could see the names of the restaurants that had been returned by their query. The default colors for the histograms were used (blue and white). In pretests, we found that all color schemes we tried confused users, so we stayed with the default colors chosen by the software designers. Screen shot of the Attribute Explorer layout.

Tasks

We framed the experiment around four tasks. The first three tasks were straightforward query tasks where the user was given several attributes over which to search. Because we wanted to test the quality of results as the tasks grew more complex, we gave each user a three-, then six-, then 9-dimension task on each interface. To prevent the possibility of a learning effect regarding the contents of the query and results, we created three tasks for each complexity level and assigned a different set of three tasks to each interface for each user. In essence, each user performed all nine tasks, but we randomized the tasks so that each task-interface pair was equally distributed. See the Test Key for the assignment of tasks and interfaces to testers.

Each level of task was designed to be roughly equivalent in difficulty, so that, for example, all 6-dimension tasks had two Boolean attributes and four multiple choice attributes.

The fourth task was a free form query task where the user was asked to formulate a query, note the parameters of the query, and then record the changes made to the query as they began to use the tool.

Materials

Materials used in this experiment can be found in the Appendices.

Pre Test Questionnaire. This questionnaire was completed by participants prior to their test, and was designed to capture individual data, including information about confounding variables, prior to the experiment.

Interface Tasks and Post-Interface Questionnaires. Each participant was provided with a 9-page document, arranged by interface in order of appearance, each section of which was composed of selected task scenarios and follow-up questions, followed by the post-interface questionnaire for each interface. For more detail on the selection and order of interfaces and tasks, see the Test Key.

Post Test Questionnaire. This questionnaire was administered after the participant had completed all nine tasks across the three interfaces. This questionnaire asks participants their comparative evaluations of the three interfaces, addressing subjective measures of usefulness and ease of understanding.

Metrics Sheet. The metrics sheet was designed to record all measures for a given test participant. Test administrators recorded all quantitative measures on this document, as well as any qualitative information gathered during the observation.

Training Document. This document was provided to test participants in the Training Group. Three pages in length, the Training Document explains how the Attribute Explorer interface works, how to manipulate the interface and interpret the results, including example screen shots of the interface used to search over restaurant data.

Participants

Participants were drawn first from a list of students who had expressed interest in being test subjects for interface testing. As more participants were needed, they were drawn from the SIMS master and PhD students. All participants were volunteers, but all were paid $12 for their time from funds provided by the SIMS administration.

In the end, all participants were graduate students equally divided between men and women, ranging in age from 19 to 49. All of the participants had experience searching with forms. The number of users with experience using visualization tools was evenly split, Half the users had seen or used a visualization tool at least once. Half the users either did not know what a visualization tool was or had heard of, but never seen, as visualization tool.

Pre Experiment Procedures

Prior to each test, the test administrator printed the appropriate materials required during the test. Each interface was set up on the test machine prior to the participant's arrival, with the Attribute Explorer interface set up with the identical attribute order as the two forms-based interfaces.

Training Procedures

Training Group:

Participants were given the Training Document at the same time that the exploration time for the Attribute Explorer began. They had access to the training document throughout the three tasks on the Attribute Explorer. No verbal instructions or physical demonstrations were given in an effort to mimic the environment of a user using the interface over the web.

No-training Group:

Some instructions were printed on the task sheet at the beginning of each task, regarding the use of slider bars to select and deselect attributes and the meaning of the color scheme of the interface (white representing the found set and colored sections representing eliminated restaurants). The observer pointed out the instructions to each tester. No verbal instructions or physical demonstrations were given in an effort to mimic the environment of a user using the interface over the web.

Pilot Study Results

We conducted two pilot tests, one on a SIMS second year Masters student and the second on an undergraduate student. These two participants represent the range of individuals we expect to test for the experiment, one very experienced with web-based interfaces and the second only slightly familiar with them.

We made several changes to the experiment based on our pretests:

Changes to the Interfaces

One problem we discovered was inconsistency between the forms based interfaces and Attribute Explorer in the order of the labels. Attribute Explorer does not recognize ordinal data and furthermore pulled the data into the interface in a very inconsistent manner. For example with Food Rating, the histogram labels read, in order, "5 stars, 1 star, 2 stars, 3 stars, 4 stars". Users found this confusing and it simply added to the amount of time each person needed for these tasks.

One might think that the Attribute Explorer would pull categorical data in either alphabetically or in the order in which it appears in the underlying data set. This is not the case, however. We were unable to determine what sort of algorithm the Attribute Explorer uses for reading in categories, so in the end we decided to simply translate the data from ordinal to continuous data where possible. So for example with attributes such as Price Range, we changed phrases such as "1-Budget" (originally done this way to force a ranking of the values in all the interfaces) and to simple numbers, "1". We then changed the label on the histogram to explain the ranking. For example, "Price Range (1=Budget; 2=Moderate; 3=High; 4=Expensive)".

The nominal data such as Neighborhoods and Cuisines appears in alphabetical order by default in the two forms based interfaces. For Attribute Explorer, we simply left them in the order the program pulled them from the data. Although this adds to the user's cognitive load, we were unable to create a better alternative within the capabilities of the software.

In addition, we also removed some of the variables included in the Neighborhoods attribute so that it was readable on Attribute Explorer. As this interface pulls each variable into a histogram, it provides a descriptive label under the bar. When more than ten or eleven variables are present, the font size of the labels is decreased to accommodate all of them and this decreases readability.

We also discovered some minor matching issues between the tasks required of the user and the interfaces and underlying data and we fixed all of those errors.

In a separate informal pretest, we tested the gradients of colors used to represent the found set and excluded values. All color gradients confused users, so we used the default colors used by the Attribute Explorer (blue and white).

Changes to the Test

We added a new variable testing the effects of training versus not training the users. We are concerned that the Attribute Explorer is intuitive to use but not intuitive to learn, especially because many individuals will probably not be familiar with this type of data visualization tool. Attribute Explorer is unusual enough to most users that it takes some explanation before the user 'gets it'.

We will divide the participants into two groups, one with training and the other without. The non-trained participants are given as long as they like to "play" with the interface to become familiar with it before they start their tasks. They are also told how to reset all the constraints, that the sliders can be used to select and deselect attributes and that white represents values that meet their criteria.

For the trained user, we will provide detailed written instructions that they can use to play with the interface (teach themselves) and as reference during the tasks. This experiment is testing the usability of each interface with the intention of using it on a web-based application; to simulate a real-world experience, we need to test each participant without verbal prompting or instructions. For this reason, we will stick to the written instructions for the training group.

We also realized that in letting the users 'train' themselves on the Attribute Explorer, we could time this period and use this to further determine how intuitive nature of the interface. By capturing how long it takes for a user to figure out this more unfamiliar interface, based on prior training/no prior training, we can judge if the training aspect makes a difference or if users still need a certain amount of time to accustom themselves to using it. If the learning curve takes a long time with and without training then we cannot consider this a very usable interface.

We found that users were having trouble interpreting what we meant by satisfaction in the post task questions, so we changed the question to ask for a level of confidence with the results.

The test ran at about an hour as we expected.

Changes to the Instruments

We found that testers we confused by the physical format of the test tasks, so we arranged them with one task per page. We also clearly separated tasks from questions.

We also developed a script so that all three testers would deliver exactly the same instructions to the participants.

Qualitative Results

Post Interface and Made-up Task Questionnaires

Attribute Explorer

On the Attribute Explorer, most users were very confused about the distinction between the select and deselect mechanism (to remove items from the found set, the user clicked on the criteria to deselect it). This was further compounded by the fact that most users were also confused by the color coding employed in the system. All found results appeared in white, while the deselected items were gray, with items successively darker as they matched more of the deselected criteria. Because the users were not initially comfortable with this functionality, they also spent more time in checking through their selections to ensure that their results were correct.

In the post-interface satisfaction survey, most participants felt that they developed a sense of the entire data set, although most responded that they were initially confused. For example, some individuals did not initially realize that the attributes appeared on more than one screen and others did not initially understand which graphical elements were important to their tasks, for example, one person responded that she felt it was easy to understand but "some details bothered me, such as the yellow line in the bar so it takes a while to know what is really important information." Furthermore, most of the participants responded that they did sense how changing their constraints affected the queries but again this took some time to understand, for example, one tester responded, "yes, once I figured out that you need to click on what you don't want (sorta counter-intuitive)."

With the made-up task on Attribute Explorer, again users were somewhat confused by the tool. In response to the effect the tool had on creating, expanding or limiting their search, one user wrote, "I saw I needed to be less restrictive, I knew that before selecting an attribute because I could see in advance there were no more white restaurants." However, most users did not feel that this visualization changed their query significantly. Most users indicated that they were satisfied with the results although only half were confident that they had found all the restaurants that matched their criteria. Basically, most users were simply confused by this interface and were not sure how to use it and/or how to understand the results.

Static Form

With the Static Form interface, most users could quickly perceive the available constraints to choose from, however a few found the long form and the amount of scrolling made it difficult to track all the possibilities. Most responded that they could somewhat understand how the constraints affected the query but would have preferred to have the feedback immediate or more easily accessible rather than having to return to the previous page. In fact, one participant responded that "this kind of task should be simple and quick, lack of immediate feedback made me less certain about what I'm doing. And lack of adequate visual cues made the form harder to navigate."

In the questions following the made-up task, the response to whether the tool affected their decision to change the query was fairly divided. Half the participants responded no and the other half did change their constraints to attempt to change their results but expressed a desire for more control while using the interface. One tester responded that 'I almost wanted to expand the query just so that I would feel like I was doing something more," while another replied that "This interface made it much more difficult to see the new results."

Dynamic Query Form

Participants liked the Dynamic Query interface the best out of the three. When first interacting with the tool, most users were pleasantly surprised and pleased when they realized that changing constraints immediately resulted in a changed results set on the same page. All users could sense what they could choose from, although some felt that it took some getting used to the interface. One participant replied that 'It took a little getting used to, but after trying a couple of things - especially combination check marks in a particular category and comparing answers with each answer set. I felt comfortable with the answer," while another said that "[I] especially like the way each choice would immediately affect the results in the frame."

When asked whether users could sense how the query constraints affected the query, all users said yes and one noted that "I was also looking at - and found helpful - the debugging notes." [Note: we have since incorporated this facility into our design as a form of feedback]. Likewise, all users felt that both of these qualities mattered with the Dynamic Query interface. In fact, following using this interface, two users realized the importance of how the constraints affected the query, one replied that "yes - actually I think in the real world, I'll put more emphasis on the constraint of things more while conducting a real search task."

The Dynamic Query interface also changed how users searched when they had an opportunity to create their own task. One user, in a response typical of most users, replied that "it helped me play around with different possibilities." Another user explained that, "It made me see clearly what the effect of my adding or subtracting choices. I can use the information to know what is the real meaning." All users expressed satisfaction with the results although not all were as confident about finding all the possible results.

General Feedback ~ Post Test Questionnaire

Users were only somewhat sure about how the interfaces worked. The Attribute Explorer, as mentioned above, confused most users and a few also had a difficult time with the other two interfaces as well. Most users felt that their search did not affect which interfaces they found most useful although a few said that the Dynamic Query form helped in altering their queries, especially on the more complex searches. One user felt that "The more constraints, the simpler the interface should be so I can focus on the search and not the additional [questions of]: is the tool working, am I making mistakes?"

In all three interfaces a few participants felt uncertain about the logic of the flow or placement of the attributes themselves making the tools difficult to use in a search. Some also felt that all constraints needed to fit on one page in order for them to easily understand the range of possibilities and also to avoid excessive scrolling. Most participants did not like Attribute Explorer. They felt that the interface had too many screens, the distinction between the two colors was confusing. They also disliked selecting attributes they wanted to omit, finding this action counter-intuitive. Some also disliked the Static Form interface because it did not show immediate feedback, how the restrictions affected the results.

In response to the question of whether the Attribute Explorer provided a sense of context or was just confusing, many responded that it was confusing, "too much information or too much visual" in the words of one person. One person replied that "Histograms were ok. They're not very useful for searching (as opposed to browsing) where they are much more useful, as they would be for qualifying a search) for just executing a given task, the dynamic query is better." Another person felt that "it got easier as I used it." Also, in response to whether any of the interfaces frustrated them, almost half responded with the Attribute Explorer and some also felt frustrated with the Static Form for all the same reasons discussed above.

And last, one tester wrote that "the AE has real possibilities for comparison of attributes after results are retrieved," while another felt that the Attribute Explorer was useful "maybe for those who don't really know what they want and just want to wander around to see what is a good choice." Overall, 10 participants choose the Dynamic Query as their favorite interface, with one person choosing each the Attribute Explorer and Static Form.

Quantitative Results

Testers

We tested the interface on thirteen testers. We kept the results from twelve of those tested (one user's screen shots were not recorded correctly). Testers were taken first from a pool of UC Berkeley students who had expressed interest in participating in interface testing. The second round of testers were masters and PhD students from the School of Information Management and Systems. The testers were all graduate students and ranged in age from 19 to 49. Half were men, half were women. There was a range of experience with visualization software. Three users did not know what visualization software was, six had used visualization software at least once, and two had heard of visualization, but had never used a tool. None of the users considered themselves experienced users of visualization tools.

TraveLite's intended audience is tech savvy. They are comfortable researching and purchasing online. They are interested in downloading software and content to a portable device. We do not, however, expect that they will be familiar with visualization tools. If anything, then, our testers were more experienced and skilled than we expect our customers to be. If the interfaces we tested were confusing to our testing audience, then we would certainly expect them to be as or more confusing to our commercial audience.

Results

Between Subjects: Training Effects

Given that the Attribute Explorer is a less-familiar interface, we considered whether or not training would have an effect on users' performance or evaluation of the tool.

Attribute Explorer Task Completion Time,

Non-Training and Training Groups

[pic]

3-dimension task time: t(7.384) = 1.452, p=0.188 2-tailed

6-dimension task time: t(10) = 1.039, p=0.323 2-tailed

9-dimension task time: t(10) = -.279, p=0.786 2-tailed

While it appears that training had an effect on reducing the variance of the three-dimensional task time using Attribute Explorer, the differences in task times between the training and non-training groups are not significant. It is interesting to note that aggregating exploration time plus the first 3-dimensional task time results in a measure that is approximately equivalent regardless of training, indicating that a consistent amount of time using the interface may be more important than training for initial comprehension.

Combined exploration and first task time

using the Attribute Explorer interface

[pic]

AE 3-dimension task time: t(7.384) = 1.452, p=0.188 2-tailed

AE exploration time: t(4.713) = -1.382, p=0.229 2-tailed

Combined (exploration + task time): t(10) = -0.392, p=0.703 2-tailed

Likewise, the presence or absence of training demonstrated no effect on recall, precision, or confidence using the Attribute Explorer, regardless of task complexity.

Recall, Precision, and Confidence,

Attribute Explorer Non-Training and Training Groups

[pic]

As we could detect no significant differences between the two groups on any of the variables, the remainder of the analysis was completed as a within subjects analysis over the entire sample.

Within Subject Effects

Time

Overall, tasks using the Attribute Explorer took more than twice as long to complete. To focus attention on the differences between the three interfaces and normalize for task difficulty, we compared the mean task time for an n-dimensional task overall to the mean task time for the same n-dimensional task on each interface. The difference in performance between the Attribute Explorer and the two form-based interfaces is dramatic, even when normalizing for n-dimensional task difficulty.

[pic]

 

It is important to note that this is a conservative estimate of task completion time, as it does not include any exploration time, which would significantly lengthen time measures for the Attribute Explorer interface.

Overall, the main effect of interface on task completion time was significant, with the differences between the Attribute Explorer and the two forms-based interfaces the primary source of the effect.

Within Subject Effects of Interface on Task Time

|a |F |df |error |significance level |

|Multivariate |0.694 |3.716 |6 |42 |0.005 |

| |(Pillai's Trace) | | | | |

|Univariate [1] |(Greenhouse-Geisser) |

|3-dimension task |7.240 |1.109 |12.204 |0.017 |

|6-dimension task |12.227 |1.258 |13.835 |0.002 |

|9-dimension task |20.427 |1.404 |15.449 |0.000 |

[1] Univariate tests using a repeated measures ANOVA of the interface factor,

including three measures of task complexity

 

[pic]

As expected, we note that as task complexity increases within a particular interface, mean task time increases. We speculate that the improvement in mean task time between the Attribute Explorer three-dimensional and six-dimensional tasks is due primarily to the extreme variance located in the Attribute Explorer three-dimensional task mean, variance which lessens in subsequent tasks. It is important to note that the significance levels of the difference between the forms-based interfaces and the Attribute Explorer interface increase as the task complexity increases, indicating a widening gap in task time.

Task complexity demonstrated a greater impact on task completion time when using the Attribute Explorer, with the mean task time increasing by a greater amount as task complexity increased. Measures of task time on the forms-based interfaces demonstrated less sensitivity to task complexity.

Task Time by Task Complexity

[pic]

Confidence

Additional evidence that users were less effective using the Attribute Explorer is found in the subjective measure of confidence in the retrieved results for each query. Overall, users were less confident in the results retrieved using the Attribute Explorer interface than with either of the forms interfaces.

Confidence in Retrieved Results, All Tasks

[pic]

 

Task complexity demonstrated a significant effect on measures of confidence in results.

We see what may be a learning effect as users of the Attribute Explorer report greater confidence in results as they continue to use the interface while task complexity increases. It is interesting to note that the Dynamic Queries interface demonstrated practically no variance in confidence measures, indicating that all testers reported confidence in their results from the start. It is also interesting to note that mean confidence levels on the Forms interface fell as task complexity increased. We feel this has to do with the lack of feedback and direct manipulation provided by the Forms interface, the impact of which is felt more as task complexity increases.

Task Confidence Level by Task Complexity

[pic]

Within Subject Effects of Interface on Confidence

|a |F |df |error |significance level |

|Multivariate |0.910 |5.846 |6 |42 |0.000 |

| |(Pillai's Trace) | | | | |

|Univariate [1] | |

|3-dimension task |13.596 |1.171 |12.878 |0.000 |

|(Greenhouse-Geisser) | | | | |

|6-dimension task |9.439 |2 |22 |0.001 |

|(Spherecity Assumed) | | | | |

|9-dimension task |4.086 |2 |22 |0.031 |

|(Spherecity Assumed) | | | | |

[1] Univariate tests using a repeated measures ANOVA of the interface factor,

including three measures of task complexity

Recall and Precision

Interface also demonstrated a small effect on both recall and precision, indicating the Attribute Explorer interface impeded users' ability to specify queries accurately. Using the Attribute Explorer interface, measures of both recall (ratio of relevant retrieved items to all relevant items ) and precision (ratio of relevant retrieved items to all retrieved items) were 13% lower than with either of forms-based interfaces tested. While the difference in means between the interfaces for recall and precision is not statistically significant, it is still noteworthy that there are differences between the interfaces.

Recall, All Tasks

[pic]

 

Precision, All Tasks

[pic]

Recall and Precision by Task Complexity

[pic]

Within Subject Effects of Interface on Precision

|a |F |df |error |significance level |

|Multivariate |0.308 |1.273 |6 |42 |0.290 |

| |(Pillai's Trace) | | | | |

|Univariate [1] |(Greenhouse-Geisser) |

|3-dimension task |3.660 |1.000 |11.000 |0.082 |

|6-dimension task |1.000 |1.000 |11.000 |0.339 |

|9-dimension task |1.571 |1.690 |18.586 |0.234 |

[1] Univariate tests using a repeated measures ANOVA of the interface factor,

including three measures of task complexity

Within Subject Effects of Interface on Recall

|a |F |df |error |significance level |

|Multivariate |0.197 |0.763 |6 |42 |0.603 |

| |(Pillai's Trace) | | | | |

|Univariate [1] | |

|3-dimension task |.604 |1.825 |20.080 |0.542 |

|(Greenhouse-Geisser) | | | | |

|6-dimension task |1.207 |2 |22 |0.318 |

|(Sphericity Assumed) | | | | |

|9-dimension task |1.527 |1.531 |16.836 |0.244 |

|(Greenhouse-Geisser) | | | | |

[1] Univariate tests using a repeated measures ANOVA of the interface factor,

including three measures of task complexity

Because the tasks were so rigidly controlled (the tasks were designed to test the specification of queries, rather than the formulation of queries) problems with recall and precision are more likely effects of the user's understanding of the interface than their understanding of the queries. Perhaps a more telling number than the means of recall and precision are the raw number of errors committed on each interface: the number of imperfect scores using the Attribute Explorer was eight, the number in each of the other two interfaces was three. (We speculate that had the sample size been larger, we would have seen more significant differences, even accounting for the issue of query specification.) We think this signals a real problem for the Attribute Explorer's ability to assist the user in forming accurate queries returning useful results.

Means of precision and recall ranged from 0.79 to 1.0 for the three interfaces across the three task dimensions. Examining the mean recall and precision measures by interface and task dimensionality, we discovered the Dynamic Queries interface performance strictly increased as task dimensionality increased (precision remaining relatively constant while recall increased), while the Forms and Attribute Explorer interfaces showed more mixed results.

Precision vs. Recall (All interfaces, All task dimensions)

[pic]

 

Post-interface evaluation

Upon completing three tasks with an interface and prior to moving on to the next one, users were asked to evaluate the interface overall by completing a short survey. Results from these surveys indicate that users preferred the Dynamic Queries interface overall, giving it the highest marks among the three interfaces on the measures of ease of understanding, usefulness, and expected results.

Attribute Explorer fared particularly poorly when compared to the other two interfaces on the measure of ease of understanding, supporting our contention that the Attribute Explorer interface was particularly confusing for users.

Ease of Understanding, All Interfaces

[pic]

The forms-based interfaces also scored better on the measure of expected results:

Expected Results, All Interfaces

[pic]

It is interesting to note that on the question of usefulness, users rated higher those interfaces that provided immediate feedback and more direct manipulation (Attribute Explorer and Dynamic Queries), indicating that these have significant value to users, despite the complications with the Attribute Explorer interface.

Interface Usefulness

[pic]

Within Subject Effects of Interface on

Ease of Understanding, Usefulness, and Expectations regarding results

|a |F |df |error |significance level |

|Multivariate |1.268 |12.135 |6 |42 |0.000 |

| |(Pillai's Trace) | | | | |

|Univariate [1] |(Sphericity Assumed) |

|ease of understanding |54.810 |2 |22 |0.000 |

|usefulness |6.361 |2 |22 |0.366 |

|expectations regarding results |8.608 |2 |22 |0.439 |

[1] Univariate tests using a repeated measures ANOVA of the interface factor,

including three subjective measures (ease of understanding, usefulness, and expected results)

Between Subject Effects

While we were able to isolate a significant main effect of interface, we also noted significant individual differences within our sample. The only significant covariant for variance in the task time using the Attribute Explorer versus the other two interfaces was prior experience with, or exposure to, visualization tools. Unfortunately, our data set was too small and the internal variance too great to use regression to see whether prior visualization tool exposure could predict task time, recall, precision, or confidence with the Attribute Explorer interface. We can state, however, that the significant main effect of the interface on task time seen earlier is overtaken by the interaction effect of interface and prior visualization experience, when prior visualization tool experience is accounted for as a covariant. Looking at the 3-dimensional task, where the differences seem most significant, particularly on the measures of task time and confidence in the retrieved results, we discovered the interaction effect of the covariant of prior experience with, or exposure to, visualization tools overtook the main effect of interface.

Within Subject Effects of Interface on Three-dimensional Task Time and Confidence,

Visualization Experience as a covariant

|a |F |df |error |significance level |

|Univariate [1] |(Greenhouse-Geisser) |

|Interface |

|task time |.270 |1.146 |11.450 |0.645 |

|confidence |.905 |1.172 |11.717 |0.377 |

|Interaction (Interface * Viz Experience) |

|task time |3.659 |1.146 |11.450 |0.044 |

|confidence |0.368 |1.172 |11.717 |0.697 |

 

[1] Univariate tests using a repeated measures ANOVA of the interface factor,

including three two measures (task time, confidence level) and visualization experience as a covariant.

This indicates that a primary contributing factor to variance in measures of the Attribute Explorer is some exposure to, or experience with, some kind of visualization tool.

To control for this, we divided our sample into two groups based on whether or not they reported having used a visualization tool before, and looked for effects separately within each group on the two significant measures of task time and confidence. Unfortunately, we were unable to revisit training effects in the analysis of the role of visualization experience, as our sample was too small to divide into the four groups such an analysis would require.

 

|[pic] |[pic] |

|[pic] |[pic] |

A series of t-tests indicated there were no statistically significant differences between the two groups based on visualization experience. It is apparent from the boxplots that the patterns are similar between the two groups, but the variance seems to be affected by the level of experience with visualization tools, but in the opposite direction than might be anticipated. With regards to task time, the direction of this effect is toward longer task times for those with prior visualization experience, indicating that they are "playing with" the interface, or perhaps examining its behavior more closely, or are finding it more difficult to use based on a set of expectations they may have regarding visualization tools in general. It appears that this effect disappears beyond the first three-dimension task; six- and nine-dimensional tasks display no such variance, indicating that testers with visualization experience have either learned how to use, or satisfied their curiosity regarding the interface.

Query Formulation: Made-up Task

Analysis of the made-up task component proved difficult. We had hoped to determine whether use of a particular interface prompted users to create or revise their queries in different ways, but the design of the task impeded this analysis. (A proposed redesign of this task component is discussed later in this report.)

Overall, the Attribute Explorer interface performed poorly when compared to the two forms-based interfaces, particularly when compared to the Dynamic Queries interface. More specifically, the interface had a significant negative effect on both time and subjective confidence in the returned results, indicating that, for these tasks, the Attribute Explorer is not an effective interface for a general audience.

Design Recommendations

Attribute Explorer

Users had three main difficulties in interacting with the interface: confusion about the meaning of the default colors, selecting a value to remove it from the query and the inability to change the size of the individual graphs.

Color

When we conducted our pilot test of the experiment, we realized that most of the participants were confused about the differences between the two colors. They had a difficult time understanding that white was the total of their found set and that the graded lilac->gray indicated that the item was removed from the set and by how many criteria. We experimented with changing these colors, doing informal user evaluations on several possibilities. However most of the feedback was contradictory, with one person thinking the found set was the one color and the other another color.

For the experiment, we chose to go with the default rather than spend added time in determining an optimal color scheme that would make the most sense to the most participants. We recommend that the AE developers might try running a small usability experiment to determine one or more easily understandable color schemes and then offer these as template gradients.

Select/Deselect

Users were also confused by the select/deselect properties of the interface. They found clicking to deselect an attribute value from a query counterintuitive. They indicated that they would have felt more comfortable if clicking meant they were adding an attribute value to their query. This probably also compounded the confusion over the colors in that, we think they would have preferred to see a dark shade of gray to indicate that a restaurant matched more of their chosen search criteria and was therefore a better choice, rather than indicating that it was eliminated by more criteria than the other items.

Graph Size

And last, users should be allowed to have control over the size of the histograms in order to scale them to a more usable size. For example some of our attributes had many possible values (for example, Neighborhoods), while others were binary and only allowed for a yes/no option. If the system allowed the user to scale the actual graphs to a more usable size, they could, for example have one large Neighborhood graph over four small Yes/No graphs. This would make more efficient use of screen space and alleviate scrolling on data sets with an extreme number of possible attributes.

Dynamic Query

Scrolling

Users indicated that scrolling through available attribute values was tiresome in the Dynamic Query interface (similarly in the forms interface). We knew this was a factor, but left scrolling in these two interfaces in order to make them more equal in this way to the Attribute Explorer, which also requires scrolling. For the final implementation of this interface, however, we recommend that attributes be clustered to eliminate scrolling as much as possible.

Proposed Implementation

In an attempt to incorporate the best of these interfaces, we propose a potential iteration of the favored Dynamic Queries interface to include a pop-up window with an Attribute Explorer-like view of the data.

[pic] [pic]

Advantages:

▪ Allows individual users to opt-in/out of the Attribute Explorer-style view

▪ Allows form-based selection of items, along with a dataset overview and query feedback/formulation help via an Attribute Explorer-style popup view

▪ Labels in the AE-view give local feedback as to selected query constraints, appearing darker if constraining a query, in gray otherwise

Potential Issues:

▪ Direct manipulation takes place in the form, but removing direct manipulation from the AE-view may make it more difficult to interpret.

▪ Need to link feedback from the AE popup view directly into the results frame of the form, possibly using brushing and linking by color-coding text to correspond, or by having detail information popup when an instance is selected in the AE popup window.

▪ Further testing required to determine which aspects of the AE-view people find more helpful: the histogram presentation, giving overview of data distribution (assistance in initial query formulation); or the color gradient (feedback on where/why a query fails, assistance with query revision)

▪ Long textual labels are problematic in the smaller AE-view. This could possibly be resolved by setting them vertically (Example 2), but this raises the issues of appropriate color selection for the text label (to be visible across all results color gradients). Additionally, we are concerned that the placement of vertical text will impede scanning of the results, especially in cases of long text labels placed over short histogram bars.

In our proposed implementation, we feel that it is important that we allow users to choose the particular metadata that appears on the histograms in the pop-up window. We also plan to constrain the amount of possible histograms to between 2 and 4. If more are used, they simply clutter the screen space. Furthermore, the purpose of a visualization such as the Attribute Explorer is to display dense data in a small space, we think that more visualization elements than this would defeat this purpose. In addition, more histograms will require more than one screen (if they are to be readable) and scrolling would also impede the ability of the user to quickly understand the context and effects changing the criteria in their search.

Conclusions

In conclusion, we have determined that using the Attribute Explorer for exploring our data set primarily confused our participants. Of the three interfaces, the Dynamic Query form rated as the most popular interface. Users liked being able to see and understand how changes in their query constraints affected their results.

Users on the Attribute Explorer took significantly longer to complete comparable tasks, even with a period of "play time" prior to starting the task.

Both Precision and Recall were lower in the Attribute Explorer in comparison to the other two interfaces. We believe this indicates that confusion about the interface actually affected the ability of users to accurately specify their queries, causing errors and hence lower values on these measures. Furthermore, users indicated a lower confidence in the results they retrieved from the Attribute Explorer than in the other two interfaces. These measures indicate that users were not able to concentrate on the task at hand - finding restaurants - as they were too preoccupied with the interface itself.

The purpose of this experiment was to evaluate the usefulness of a visual tool in assisting users to find restaurants to include in a customized travel guide. We plan to deploy this application over the web and therefore the tool will need to be easy to understand and use by even the most novice user. This experiment was conducted with the participation of a group of relatively technically savvy individuals, many of whom have had some prior experience with similar visualization tools. Given that Attribute Explorer did not perform very well with this population, it is simply not feasible to deploy it as the primary means of interacting with our system for the average Internet user.

Although this visualization tool confused users somewhat, we do believe that the Attribute Explorer might be useful as a complementary view of the Dynamic Query form. By showing information about data density and how changing the constraints affect the results, a histogram view can assist users in formulating, modifying and evaluating their queries. The user will receive quick feedback on their actions and also be able to perceive where their result set falls in the context of the entire dataset.

Changes to Experiment for Future Iterations

In future iterations of this experiment, we would like to restructure the Made-Up task in order to capture better and more quantifiable results about how the interface effects the task of modifying a query that returns no hits. Our proposed design for this task involves creating a query that we know will return zero hits and then asking the user to relax the constraints.

In this modification phase of the task, we will specify that the participant continue exploring until they retrieve at least 10 hits. We will record how they modified the query through a talk aloud protocol and also by capturing, through observation, the route they stepped through to achieve this goal. We will also record the time to completion of the task, in addition to the number of modifications they made before retrieving the required number of results.

We would also like to capture more details of participants' past experiences with other visualization tools in the pre-test survey and more quantitative measures regarding post interface satisfaction.

Acknowledgements

We would like to thank all of those who participated in our tests. We look forward to the opportunity to return the favor. We would also like to thank Rashmi Sinha for her guidance in designing the experiment and analyzing the data.

References

[1] Christopher Williamson and Ben Shneiderman; The Dynamic Homefinder: Evaluating Dynamic Queries in a Real-Estate Information Exploration System. Proceedings of the Fifteenth Annual International ACM SIGIR conference on Research and development in information retrieval, 1992, Pages 338 - 346 Available at: ACM Digital Library:

[2] Williamson: The dynamic HomeFinder

[3] Bill Shapiro, Hench Qian. FareBrowser: An Interactive Visualization Tool for Finding Low Airline Fares. Project report for CMSC 828S/838S: Information Visualization. Available at: FareBrowser: An interactive Visualization Tool for Finding low Airline Fares.

[4] Plaisant, Catherine; Shneiderman, Ben; Doan, Khoa; Bruns, Tom. Interface and Data Architecture for Query Preview in Networked Information Systems.(Statistical Data Included) ACM Transactions on Information Systems v17, n3 (July, 1999):320. Available at ACM Digital Library

[5] Juarez, O.; Hendrickson, C.; Garrett, J.H., Jr. Evaluating visualizations based on the performed task Information Visualization, 2000. Proceedings. IEEE International Conference on , 2000 Page(s): 135 -142 Available at IEEE Digital Library.

[6] Ben Shneiderman; Proceedings of the 1997 international conference on Intelligent user interfaces, 1997, Pages 33 - 39 Available at: ACM Digital Library:

[7] Williamson: The dynamic HomeFinder

[8] Christopher Ahlberg and Ben Shneiderman; The Alphaslider: A Compact and Rapid Selector. Conference proceedings on Human factors in computing systems: (celebrating interdependence), 1994, Pages 365 - 371 Available at: ACM Digital Library:

[9] Mark Derthick, James Harrison, Andrew Moore, and Steven F. Roth. Efficient Multi-Object Dynamic Query Histograms. Proceedings of the IEEE Symposium on Information Visualization (InfoVis '99), San Francisco, CA, October, 1999, pp. 84-91. Available at:

Other References

Spence, Robert. Information Visualization. Harlow, England: Addison Wesley for ACM Press, 2000.

ACM Computing Surveys: Advanced Graphic User Interfaces: Elastic and Tightly Coupled Windows Author: Ben Shneiderman

ACM Digital Library: The attribute explorer Authors: Lisa Tweedie, Bob Spence, David Williams, and Ravinder Bhogal from the CHI'94 proceedings

ACM Digital Library: Data structures for dynamic queries: an analytical and experimental evaluation Authors: Vinit Jain and Ben Shneiderman

ACM Digital Library: Dynamic queries: database searching by direct manipulation Authors: Ben Shneiderman, Christopher Ahlberg, Christopher Williamson

ACM Digital Library: Dynamic queries for information exploration: an implementation and evaluation Authors: Christopher Ahlberg, Christopher Williamson, and Ben Shneiderman

ACM Digital Library: Improving the human factors aspect of database interactions Author: Ben Shneiderman

ACM Digital Library: Universal usability Author: Ben Shneiderman

ACM Digital Library: User interfaces for creativity support tools Author: Ben Shneiderman

ACM Digital Library: Visual information seeking: tight coupling of dynamic query filters with starfield displays Authors: Christopher Ahlberg and Ben Shneiderman

ACM Digital Library: Design: design for what? six dimensions of activity (part 1 of 2) Authors: Austin Henderson and Kate Ehrlick This is part 1. Part two appears in v.6 of ACM's interactions

A framework for search interfaces Author: Ben Shneiderman

Dynamic queries for visual information seeking Author: Ben Shneiderman from IEEE Software

Appendix

Appendix A: Problems with Visual Attribute Explorer

We used the Visual Attribute Explorer (VAE) as a test interface to see what effects are associated with using a visual interface over a standard forms based interface in presenting data described by many attributes and recalled by fluid queries. The VAE is currently alpha version software and as such it has limited functionality and documentation. Getting the software to handle data was a process of trial and error. In this section we will first discuss the problems we encountered in implementing the software and then briefly discuss problems users had in interacting with the interface (this is discussed in more detail in preceding sections)

Formatting Data for the Visual Attribute Explorer

According to the documentation, data for the Attribute Explorer is supposed to be formatted in tab delimited (.txt) or comma separated (.csv) files. We found in importing the data, however, that Attributed Explorer parsed the data according to more than commas, specifically, it separated the names of restaurants into two or more fields according to how many words appeared in the name of the restaurant. In the end, we determined that if the data being imported contains any text, it is best to import it as a .txt file where field contents have been enclosed in double quotes and the fields themselves are separated by commas.

Many to Many Relationships

One major weakness of the Attribute Explorer (and other visualization tools we've seen) is that it does not handle the many-to-many relationships in the database we were using. Derthick, Harrison, Moore and Roth discuss this problem at length [9]. It needs one row of data per record (a flat file). As a result, the dataset we used for the experiment lacked some of the richness we are able to represent with a relational database tool. There are two sides to the problem. One must either add an attribute (a new column) for each value that could appear in the original attribute, which is what we did for the Serves Breakfast, Serves Lunch, Serves Dinner attributes (previously represented in one attribute, represented in the VAE as three yes/no values). Or one can limit each data point to just one value, so that a restaurant which actually serves French and Indian food will be represented as serving one or the other, not both. The other two options for solving this problem are even more problematic than this one. The first is to have a row for each value of an attribute, which can increase exponentially when one considers all the combinations of attributes that are present for individual restaurants, dramatically over-representing a particular restaurant in the visualization. Alternatively, one can create a category for every possible combination of attribute values. So, for instance, there would be one category for Chinese/American restaurants, one for Chinese/Lao, another for American/Lao. We thought this would be extremely confusing for the user. None of the options for dealing with this problem are ideal, which is why we consider this a major flaw in using visualization techniques.

Data Types

One particularly troublesome feature of the Visual Attribute Explorer is that it examines the imported data and decides what type it is. Unfortunately, there is no way for the user to correct the data type should the Attribute Explorer incorrectly interpret the data type. Most of our data was categorical, which Attribute Explorer should be able to handle, but it was interpreting the data as continuous and then assigning null values to our data points. The fix for the problem was to organize the incoming table with the categorical columns first, after which all of the categorical columns were read correctly, and all of the continuous columns as well.

Data Ordering

A last major criticism of the VAE is its apparently arbitrary ordering of incoming categorical data. For data which the VAE determined to be continuous, the categories were read into the interface in numerical order. So, for instance, if a star rating was represented with the number "1," "2," etc. the categories would appear in the interface in that order.

If, however, the VAE determined that the data was categorical, it read the data into the interface in a fashion that was inscrutable to us. For instance, it read in neighborhoods in a seemingly arbitrary order, though one might assume that restaurant categories would be displayed alphabetically. This became more of a problem for attributes which looked categorical but were actually ordinal (such as Price Category, with levels Budget, Moderate, High, Expensive). We assumed that labeling each of the levels, so that the order of the category was indicated by a leading digit, would solve the problem. SO, for instance, Budget was recoded as 1- Budget, Moderate as 2 - Moderate, etc. Because the values were textual, however, the VAE interpreted them as categorical rather than ordinal and then proceeded to order them in a completely random fashion. We were ultimately unable to determine how to assign a variable an ordinal datatype in the Visual Attribute Explorer.

On inspection, we determined that the categories were NOT ordered: alphabetically, numerically, by number of data points, or in the order in which they appeared in the incoming data set. The only remaining criteria we could conceive of for this ordering scheme was that it was based on some sort of relationship between the attributes which we could not determine simply by looking at the ordering the VAE generated.

In the end, we solved the problem in a less than ideal way by forcing certain categorical attributes to be continuous and changing the data labels to include a "key" to the ordinal labels.

[pic]

Appendix B: Instructions for Using the Attribute Explorer (see attached)

Appendix C: Tasks (see attached)

Appendix D: Test Key (see attached)

Appendix E: Sample Metrics Sheet (see attached)

Appendix F: Script and setup instructions (see attached)

Appendix G: Sample Pre Test Questionnaire (see attached)

Appendix H: Sample Dynamic Query Tasks and Questions (see attached)

Appendix I: Sample Attribute Explorer Tasks and Questions (see attached)

Appendix J: Sample Forms Tasks and Questions (see attached)

Appendix K: Sample Made Up Tasks and Questions (see attached)

Appendix L: Sample Post Test Questionnaire (see attached)

Appendix M: Old Instruments (see attached)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download