ABSTRACT - University Of Maryland



Empirical Comparison of Four Accelerators for Direct Annotation of Photos

John Jung

Summer 2000 Independent Study Project

ABSTRACT

The annotation of graphics presents a problem since the annotated information presents multitudes of utility, the actual task of annotation can be tedious. In order to populate the database with information efficiently, the method of annotation must maximize the amount of information in the shortest amount of time while retaining appeal. Therefore the goals for any given method of annotation are speed, accuracy and appeal.

Direct manipulation has proven to be an efficient method in many areas. Direct annotation, a method of direct manipulation, was tested. First, an experiment with 48 subjects was conducted using three methods: direct annotation, a hybrid of traditional and direct annotation, and the traditional caption method. Then an experiment with 24 subjects was conducted using four versions of direct annotation and measured annotation time and subjective preference. The experiment showed that it took approximately three minutes to annotate five photos totaling twenty annotations using direct annotation. The different methods produced strikingly similar results in both preference and time.

Keywords: direct manipulation, direct annotation, rapid annotation, user interface, photo libraries, annotation interface

INTRODUCTION

The work of annotating graphics can be a cumbersome and error-prone task even for many professionals in the field. In the past, it was a task that was very tedious to perform and to organize. With the advent of many tasks being computerized in the late 1990s, the task of annotating graphics was sure to follow. Through development of software such as PhotoFinder [1], FotoFile [2] and NSCS Pro [3], ACDSee [4], and MGI PhotoSuite, [5] annotation is possible through computer software. However, there is an important question; what is the best method of annotating? What kind of user interface design for annotation is the most efficient and rewarding to the user?

The software that was used for the experiment is the PhotoFinder [1] prototype that is in development at the HCIL in the University of Maryland. PhotoFinder is being developed especially with important user interface design principles such as direct manipulation and user satisfaction in mind.

Other software of note that deals with image annotation is FotoFile, NSCS Pro, ACDSee and MGI PhotoSuite. FotoFile Is an experimental system for multimedia organization and retrieval, based upon the design goal of making multimedia content accessible to non-expert users. Searches and retrieval are done in terms that are natural to the task. The system blends human and automatic annotation methods. It extends textual search, browsing, and retrieval technologies to support multimedia data types.

In NSCS Pro, users can tag picture records with unlimited numbers of Descriptor keywords. Also, it is possible to create two kinds of information labels -- ID labels and Caption labels. ID labels contain the primary information such as what the picture is and where the picture was taken. ID label information is saved to a database, and this information is used as a lookup for speeding up new data entry and for searching and filing. NSCS Pro also can create caption labels that can contain a large amount of free-form information to further amplify or expand on what is in the ID Label. Other photo library programs exist on the internet such as picture sharing software programs. At [7], one could “create password-protected picture albums and invite people to see your creations”. This site offers the user methods of annotating their photos by way of clicking and typing into specific fill boxes which aim to help the user organize, arrange and edit his or her photos. Another relevant feature is the picture e-mail option which allows the user to specify one to six picture to send someone with three possible delivery styles. These styles include “Carousel, Photo Cube, and Slide Show”. The selection process asks the user to click the checkboxes of the picture(s) he or she wishes to send in the e-mail.

ACDSee [4] has become one of the most popular photo browsers. It allows annotation by allowing a segment of text to be typed into each photo. This type of annotation is more like an afterthought rather than a feature, as the text is not searchable. And also, the text is a very small part of the screen (approx. 5 milimeters high and 15 centimeters long on a 19-inch monitor at 1024x768 resolution) and is not adjustable to allow for more text.

MGI PhotoSuite [5] is one of the leading applications on general photo editing. It features annotation by letting the user organize an album. Then each picture in the album has categories of annotation that can be filled in. It has many categories, including Event, People in Photo, Location, Rating, and Title. All of the fields are searchable. Each category is annotated by filling in a text box and then clicking on a button that is labeled “Add/Modify Category.”

Direct Annotation

The concept of direct annotation [1] integrates the concept of direct manipulation [6] with annotation of graphics. The drag and drop method [1] has been implemented for use in Photofinder. ().

[pic]

Fig.1. Photofinder Interface

The user can select the name from a list in the left and drag and drop the name from the list to the photo. The first experiment which compared direct annotation with two other ways of annotation [8] showed that direct annotation is worthy of exploration due to its fast nature and its high subjective satisfaction ratings.

There were three methods of annotation in the first experiment. It involved the drag and drop, a method of click and type (clicking at the location of annotation and typing in the annotation, considered a hybrid of direct manipulation and conventional annotation methods), and the old text-based caption method, which was a caption at the bottom of the photo indicating names from left to right.

The results showed that the textbox and direct annotation were the fastest, but direct annotation was overwhelmingly the subjectively preferred interface.

So this experiment takes the past experiment a step further by exploring what direct annotation methods may be best for rapid annotation. The four designs that were used include the simple drag and drop, a split menu featuring the most frequently used names (number of names displayed is determined by the user), a method that uses the function keys as hotkeys, and a right-click annotation method.

FIRST EXPERIMENT

Interfaces

Drag and Drop

This is the basic method of direct annotation. The method, which had a list that was in alphabetical order, displayed every person’s name in the database. Eight names were displayed at one time.

The other notable feature was that the user could click in the list of names and then type in the first letter of the person’s last name to move the highlighted selection closer to that. This is a common feature throughout much of available Windows software.

Click and Type

This was the hybrid of direct annotation and the traditional method of typing. It allowed the user to place a label just like direct annotation, but the label of the name was made by typing the name into a textbox and then pressing the Enter key.

Textbox

In the Textbox method, the subjects simply typed in the names of the people in a textbox located below the photo, from left to right.

Hypothesis

- The direct annotation would have the best overall result: better time and better subjective satisfaction compared to the click & type and the text box methods.

- For the click & type method, it would have second highest subjective satisfaction but the slowest time due to its combination of using keyboard & mouse.

- Lastly, the text box should have a lowest subjective satisfaction due to its dull nature; however, since the text box only requires a usage of a keyboard, this method should have a comparable completion time with direct annotation.

Experiment Variables

The independent variable was the type of interface with three treatments:

(i) Drag and Drop, (ii) Click and Type, and (iii) Textbox.

The dependent variables were:

(i) Time to annotate all nine names in three pictures and (ii) subjective satisfaction.

Participants

Forty-eight volunteers participated. There were no requirements for participation in the study. About eighty percent were male. The participants were recruited from a computer science class at the University of Maryland, and were accepted in the order of response. The participants were students who were 18 to 21 years old.

Procedures

A within-subjects experimental design was used. Three pilot-study sessions were conducted. Each session lasted in the range of fifteen to twenty minutes, depending on how fast the user completed the tasks.

All instructions were given through an instruction sheet and the experiment software [8].

For each interface, the subject was required to annotate nine names in three photos. The number of characters in a person’s name was controlled. For each interface, the total number of characters that the subject had to type (if typing was required) was 116 characters total for each interface. The subjects knew the names of the people in the picture through the instruction sheets. On the sheets, a copy of each person’s head shot in the photo and their corresponding name was included.

Each subject was assigned a unique number, one of twelve permutations of “123.” The ordering was controlled so that the order in which the subjects completed the tasks would not be a factor in determining the dependent variables. The number 1 represented the Drag and Drop, 2 the Click and Type, and 3 the Textbox. They were instructed to complete the tasks in that order (e.g. 312’s order is Textbox, then Drag and Drop, then Textbox).

RESULTS

Analysis of the timed tasks was done using one-sided ANOVA. The results showed that the times were not significant according to treatment; F(2,142) = 2.1 < 3.02, (p < 0.05), however the treatment had a significant effect on subjective satisfaction. F(2, 142) = 6.7 > 3.02, (p < 0.05).

The means for each treatment in order were 107.1 (24.0), 130.2 (46.6), and 100.5 (35.5) seconds. (Std. Deviation in parenthesis)

[pic]

Fig. 2 The bar chart of time taken to complete the tasks for each interface.

The means for subjective satisfaction were 6.8 (1.8), 5.2 (2.3), and 4.2 (2.2).

[pic]

Fig. 3 The bar chart of subjective preference for each interface.

SECOND EXPERIMENT

Interfaces

Drag and Drop

By providing the basic method of annotation, the user was allowed to experience the simplified, most basic way of direct annotation, which some subjects preferred over the other fancier ways to annotate

[pic]

Fig. 4. Drag and Drop interface from the second experiment.

The drag and drop interface was almost exactly the same except that this list displayed eighteen names, and that the

Split Menu

The split menu method of annotation included a list of names in the top that would automatically display the most frequently annotated people, while the bottom list was much like that of the drag and drop method. The split menu has some established benefits, offering up to 58% increase in speed [9].

The split menu featured a resizable splitter bar so that the number of the most frequent names displayed was adjustable by the user. A design decision was made to remove the scrollbar from the top window, while the bottom half retained its scrollbar. Photofinder is able to list the most frequent names by the collection or the library; for the purposes of the experiment, it was decided that the most frequent names by collection would be the most appropriate.

[pic]

Fig. 5. The split menu interface.

The split menu raises interesting questions about what sort of automatic algorithms may facilitate rapid annotation. For instance, in operating systems, it is known that for predicting future access, recency is a reliable indicator. Would that be a more efficient algorithm than frequent use for graphical annotations? And even if one of the two were to establish themselves as the most effective method, do the type of graphical annotations have an effect (personal photo library vs. annotating maps)? Further research may be done exploring these ideas.

Function Keys

One of the most frequently used features in many interfaces, especially by expert users, is the “hotkey.” One of the most important design issues to consider is to allow the expert user to perform rapid tasks [6]. In Microsoft Windows, Alt-F4 almost always closes the active window. Ctrl-C copies, Ctrl-X pastes, and Ctrl-A selects all of the available text in a section. F1 is the key for requesting help. These shortcuts allow the expert users to perform routine tasks quickly, and aid the expert users in efficiency and productivity.

The function keys that were implemented in the design of the experiment included eight available function keys for the user to use as hotkeys. The user can drag and drop the name into a box

[pic]

Fig. 6. The function keys interface.

labeled from F1-F8, or can highlight a name in any way they wish and then press any of the function keys to assign the key to that name. After the key is assigned to a particular name, then the corresponding name’s first four letters of the first name and the first four letters of the last name are displayed in the box. Then after doing so, when the mouse is over the picture and any of the keys are pressed, an annotation is made at the location of the mouse cursor.

When the expert user has a good idea about who or what is to be annotated, then the hotkeys can be put to work very efficiently. Instead of having to find the name desired and drag the name over to the position, the user can simply position the mouse and press the hotkey. This feature is especially useful for things such as personal photo libraries, in which there is a high volume of frequently appearing names.

Right-click Pop-up Annotation

The right-click pop-up annotation (RPA) also aims at reducing mouse movement, but in a different way. The RPA offers a menu when the right mouse button is pressed. The menu consists of the options “Next”, “Previous” and “Annotate X to Photo” such that X is the highlighted person’s name in the name list.

[pic]

Fig. 7. RPA interface.

The focus is sent back to the name list each time that an annotation is made so that the user can then type the first few letters of the next name they wish to annotate, then only have to move to the destination and click the right mouse button and select the name to annotate. This saves mouse movement and can save a lot of time, just as the function keys do.

Hypothesis

- The Function Keys and Drag and Drop will have the two slowest times.

- Subjective satisfaction will be slightly lower for Drag and Drop but the ratings will not show statistically significant difference.

Drag and Drop offers direct annotation without any other helper features, so it will not be fast. The Function Keys’ main advantage is that it gets faster with time. But it has a relatively higher learning curve, and therefore the subjects may need extra time before becoming an advanced user. In this experiment the amount of photos to be annotated is relatively small, so the Function Keys will do poorly.

Drag and Drop, since it is included in the other three interfaces, should seem like it is the least novel, perhaps leading it to be less liked. However there are many people who like things simple. Therefore Drag and Drop may be slightly lower in ratings overall, but it will not be significantly lower. Also, since the interfaces are similar, the ratings for each interface will not greatly differ.

Experiment Variables

The independent variable was the type of interface with four treatments:

(i) Drag and Drop, (ii) Split Menu, (iii) Function Keys, and (iv) RPA.

The dependent variables were:

(i) Time to annotate all twenty names in five pictures and (ii) subjective satisfaction.

Participants

Twenty-four subjects participated and were paid $10 each for their time. There were no requirements for participation in the study. Twenty-one were male, and three were female. The participants were recruited by placement of flyers on the University of Maryland campus, and were accepted in the order of response. The participants were students who were 17 to 25 years old.

Procedures

A within-subjects experimental design was used. Three pilot-study sessions were conducted. Each session lasted in the range of twenty-five to forty minutes, depending on how fast the user completed the tasks. All of the instructions were given by the experimenter. For each task, the subjects were asked to use the distinguishing feature of the interface (since all interfaces did have drag and drop available) to the extent that they were comfortable with it. They were not told to work as fast as possible, but were told to work “reasonably fast, with no pressure.”

For each interface, the subject was required to annotate twenty names in five photos. The correct name for each person to be annotated appeared near each person, so that is how each subject knew who and where to annotate a name. The number of appearances of a person was controlled. For each interface, there were two people appearing in four out of the five photos, three people appearing in two out of the five photos, and six people appearing only once out of the five photos.

Each subject was assigned a unique number, one of twenty-four unique permutations of “1234.” The ordering was controlled so that the order in which the subjects completed the tasks would not be a factor in determining the dependent variables. The number 1 represented the Drag and Drop, 2 the Split Menu, 3 the Function Keys, and 4 the Right-click Annotation. They were instructed to complete the tasks in that order rop, then RPA, then Split Menu, then Function Keys.

First, the subject was given instructions on how to do basic annotation via drag and drop. Then before beginning work on each interface, the subject was given a practice session in which he/she was allowed to explore the particular interface and get sufficient practice. Each practice session included two photos and ranged from three to six annotations to be completed.

The timer was activated when the subject pressed the start button and stopped when he/she pressed the finish button.

RESULTS

Analysis of the timed tasks was done using one-sided ANOVA. There was a significant effect with the varying interfaces with time; F(3,92) = 2.77 > 2.70, (p < 0.05) however, with subjective satisfaction, it was not significant. F(3,92) = 2.11 < 2.70, (p < 0.05).

[pic]

Fig. 8. Bar chart for annotation times

The mean times for each interface, in the order from (1) to (4) were 148.0 (43.5), 151.9 (43.7), 183.1 (53.4), and 163.9 (44.2) seconds. The hypothesis that drag and drop would be one of the slower ones was not supported, while the Function Keys were slow indeed. Subjective satisfaction means were 6.1 (1.6), 6.9 (1.7), 6.0 (1.9) and 6.9 (1.4)

[pic]

Fig. 9. Bar chart for subjective preference

DISCUSSION OF EXPERIMENTS

Interface Discussions

Drag and Drop, Exp. 1

The Drag and Drop had significant satisfaction to its advantage, while performing with a high level of speed. Most Windows users are already familiar with the concept of dragging and dropping, and the learning curve was likely small. The keyboard was not a vital part of annotation, and the decreased amount of switching between the two devices could have had a significant effect on the lower time.

Click and Type

The Click and Type method proved to be the slowest of the three, likely because of the act of switching between the two devices. The subject had to click on the location using the mouse, and then had to type in the name.

Textbox

The Textbox was slightly faster than the Drag and Drop. However, an experimental condition may have contributed to this. Because the names were printed on paper and spelled correctly, the subject did not have to recall the spelling of anyone’s name. When annotating, it is not realistic to always have a list of names with photos handy. So the subjects could just look at the sheet and type in the name from left and right, resulting in the possibly fast time.

However it can be noted that the textbox would not do an efficient job of keeping searchable data.

Drag and Drop, Exp. 2

In the second experiment, Drag and Drop had the fastest mean time. Possible reasons for this may include that the learning curve may be greater than initially imagined for the other three interfaces. On some observations the subjects seemed to have some difficulty learning the different types of interfaces, and some even noted “I like the drag and drop because it’s just really simple and not confusing.”

Split Menu

The Split Menu design was well-liked by many subjects. However some noted that it was confusing that the names would switch places automatically since the algorithm is made so that it automatically adjusts to the frequency of annotations. However it would likely be the case that with more photos to annotate (perhaps in the range of 30 photos with 120 annotations), that the switching of names would become less and less frequent, thereby being less confusing.

Also the majority of the subjects did not like that there was no scrollbar. They generally preferred the scrollbar to resizing the window. Many commented, “I would like it if it had a scrollbar.”

Function Keys

While the Function Keys had the slowest mean time, there could be several explanations why it proved to be the slowest.

First, like the Split Menu, this method gains an advantage as more annotations are required. The setting up of the Function Keys (dragging the name into the boxes) can take a long time, and as the task time gets shorter, the set-up time becomes a higher percentage of the overall time.

Second, the Function Keys are useful when the user knows what the most frequent annotations will be. Often this is the case when annotating personal photos, and in this case the subjects had no idea who would appear how many times in the five photos.

Third, it was observed that there are some kinds of people that simply dislike hotkeys and do not perform well when assigned rapid tasks. The quicker the person was overall (relatively lower times than other users), the less significant the gap became between the Function Keys and the rest of the interfaces. Some of the slower users would comment that they just don’t like hotkeys at all, and some gave it the poorest rating at 1. The variance numbers support this theory because variance is highest for the Function Keys in both satisfaction and time.

Lastly, because of the time constraints in this experiment, it was not realized that the setup of the function keys should not have been included in the timing of the methods. The setup may take anywhere between 8 to 30 seconds, and if the setup time was to be filtered out, the Function Keys would likely see a decrease in overall time.

Other user comments included “It’s too much information to memorize” and “the names are too hard to see if it’s only the first four names.” So the boxes should perhaps be expanded, but that comes at the cost of space.

Right-click annotation

The right-click annotation was another method that was generally well-liked. It had the least variance in user satisfaction.

Perhaps the right-click interface was appealing to the users because it looked just as simple as the drag and drop, but provided functionality because they may be more familiar with the nearly universal concept of right-clicking in the Windows interface. Comments regarding how the “Next” and “Previous” buttons were useful included “I like not having to go up there and click to get to the next picture” and “It saves a lot of mouse movement.” Others described it as “simpler and efficient.”

Other comments and observations

For the purposes of establishing times for an expert user, the experimenter tried performing the tasks in the experiment five different times.

The means were 85.8 (6.8), 72.2 (3.7), 70.6 (2.5), and 77.6 (1.14) seconds. The results were quite different from what was shown by the experiment. F(3, 16) = 13.82 > 5.29, (p < 0.01) indicates that the treatments do have a significant effect. Perhaps a group of expert users can be subjects in an experiment to see whether the treatments have an effect. An ideal experiment would be to have 48 expert users complete a thorough annotation task that would last about an hour or an hour and a half. The tasks would be the same for all subjects, and would involve people that all of the subjects recognize.

FURTHER RESEARCH

Direct manipulation is a concept that research has indicates that it provides optimal efficiency and satisfaction. This study establishes that varied approaches to direct manipulation may not significantly increase efficiency and satisfaction.

Further research can perhaps be directed toward automatic recognition of people. As long as the user retains internal locus of control [6], automatic tasks performed by the computer are seen as beneficial. Some subjects even suggested that automatic recognition of some sort would be helpful. One noted: “if I could press tab and I could annotate each person after pressing tab to switch between them, it’d be cool.”

Automatic recognition could bring about many annotation methods that are even more effective. Face recognition is still in the development stages, but even now it is possible to recognize ovals and the basic shapes that make out a person’s face, arms, body and legs. Currently in the Photofinder prototype, one of the annotatable fields is “Number of People in Photo.” It would not be difficult to recognize the number of people in a photo accurately. If this can be done with accuracy, then it is one entire field that doesn’t have to be annotated by the user.

CONCLUSIONS

The hypothesis for the first experiment was supported by the fact that there was significant preference for the Drag and Drop method. Also the mean times were as predicted, but were not statistically significant. Since there was significant preference for direct annotation and other supportive comments of it as well (it was noted that many subjects thought that the concept of direct annotation was a “good way to do it” and a “creative idea.”), it was decided that direct annotation methods were worthy of a follow-up study.

The results suggest that while there are some slight differences among the direct annotation methods, for the most part there isn’t enough of a significant distinguishing factor to proclaim one as the most efficient or the most rewarding. However, a study of expert users is strongly recommended in order to verify that the methods don’t result in significance.

If the methods aren’t significant, then the best option may be to include many methods as possible while allowing a variety of options to let the users customize the methods to their optimal use.

However it raises the question of how much the user can learn at first; if presented by too many options in the beginning, the user may become confused and/or frustrated. Therefore it may be optimal to use the level-structured approach [6] when designing the initial interface. It seems likely that if the user is presented with simple drag and drop options and under “Advanced Options” the rest of the features are included, not only will users not be confused, but the users may discover additional pleasure in finding “neat features” that are available to them once they master the simpler tasks.

ACKNOWLEDGEMENTS

Endless thanks go to Mr. Hyunmo Kang, Dr. Ben Shneiderman, Dr. Catherine Plaisant and Dr. Ben Bederson for making this project possible. Their support, suggestions, ideas and technical help contributed greatly. And thanks to our lab manager, Anne Rose; no one else could be more helpful and friendly when things go awry in the lab. Also, we would like to thank the team members that contributed to the first direct annotation experiment; Yoshimitsu Goto, Allan Ma and Orion McCaslin. And to Dave Moore, for being an “Annotating Fiend.” (

REFERENCES

1. Shneiderman, B., Kang, H. Direct Annotation: A Drag-and-Drop Strategy for Labeling Photos, 2000.

2. Kuchinsky, A., Pering, C., Creech, M. L., Freeze, D.L. Serra, B., Gwizdka, J. "FotoFile: A Consumer Multimedia Organization and Retrieval System." Proceedings of ACM CHI 99 Conference on Human Factors in Computing Systems v.1, (1999) 496-503.

3. Norton, B., NSCS Pro, , 2000.

4. ACD Systems, ACDSee.

5. MGI Software Corp. MGI PhotoSuite.

6. Shneiderman, B,. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Addison-Wesley, Reading, MA. 1998.

7. Intel Corp., , 2000.

8. Jung, J., Ma, A., McCaslin, O., Goto, Y., The Effect of Direct Annotation on Speed and Satisfaction. .

9. Sears, A. and Shneiderman, B., “Split Menus: Effectively Using Selection Frequency To Organize Menus”, ACM Transactions on Computer-Human Interaction 1, 1, 27-51, 1994

Statistics

DragDrop SplMenu Fkeys RPAnnot

Trial1 95 75 72 78

Trial2 78 68 67 78

Trial3 81 70 69 79

Trial4 85 71 72 77

Trial5 90 77 73 76

Sum 429 361 353 388

StDev 6.833739825 3.701351105 2.50998008 1.140175425

Sum ^ 2 184041 130321 124609 150544

SS With 186.8 54.8 25.2 5.2

SS Total 976.95

MS With 17

MS betw 234.9833333

T^2/N 36808.2 26064.2 24921.8 30108.8

G^2/N 117198.05

T^2/N-G 704.95

F Ratio 13.82254902

Mean 85.8 72.2 70.6 77.6

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download