English: The Lightest Weight Programming Language of them …
What Would They Think?
A Computational Model of Personal AttitudesEnglish: The Lightest Weight
Programming Language of them all
presented at Lightweight Languages 2004 (LL4) –
Hugo Liu
MIT Media Laboratory
20 Ames St., Cambridge, MA, USA
hugo@media.mit.edu
Pattie MaesHenry Lieberman
MIT Media Laboratory
20 Ames St., Cambridge, MA, USA
pattielieber@media.mit.edu
ABSTRACT
Every program tells a story. Programming, then, is the art of constructing a story about the objects in the program and what they do in various situations. Traditionally, these stories are expressed in so-called programming languages. These languages are easy for the computer to accurately convert into executable code, but are, unfortunately, difficult for people to write and understand.
In this paper, we explore the idea of using descriptions in a natural language like English as a representation for programs. While we cannot yet convert arbitrary English descriptions to fully specified code, this paper shows how we can use a reasonably expressive subset of English as a visualization tool. Simple descriptions of program objects and their behavior are converted to scaffolding (underspecified) code fragments, that can be used as feedback for the designer, and which can later be elaborated. Roughly speaking, noun phrases can be interpreted as program objects; verbs can be functions, adjectives can be properties. A surprising amount of information about program structure can be inferred by our parser from relations implicit in the linguistic structure. We refer to this phenomenon as programmatic semantics. We present a program editor, Metafor, that dynamically converts a user's stories into program code, and in a user study, participants found it useful as a brainstorming tool. Understanding the personalities and dynamics of an online community empowers the community’s potential and existing members. This task has typically required a considerable investment of a user’s time combing through the community’s interaction logs. This paper introduces a novel method for automatically modeling and visualizing the personalities of community members in terms of their individual attitudes and opinions.
“What Would They Think?” is an intelligent user interface which houses a collection of virtual representations of real people reacting to what a user writes or talks about (e.g. a virtual Marvin Minsky may show a highly aroused and disagreeing face when you write “formal logic is the solution to commonsense reasoning in A.I.). These “digital personas” are constructed automatically by analyzing personal texts (weblogs, instant messages, interviews, etc. posted by the person being modeled) using natural language processing techniques and commonsense-based textual-affect sensing.
Evaluations of the automatically generated attitude models are very promising. They support the thesis that the whole application can help a person form a deep understanding of a community that is new to them by constantly showing them the attitudes and disagreements of strong personalities of that community.
Categories and Subject Descriptors
H.5.2 [Information Interfaces and Presentation]: User Interfaces – interaction styles, natural language, theory and methods, graphical user interfaces (GUI); I.2.7 [Artificial Intelligence]: Natural Language Processing – language models, language parsing and understanding, text analysis.
General Terms
Algorithms, Design, Human Factors, Languages, Theory.
Categories and Subject Descriptors
H.5.2 [User Interfaces]: interaction styles, natural language;
General Terms
Design, Human Factors, Languages, Theory.
Keywords
Affective interfaces, memory, online communities, natural language processing. commonsense reasoning.Natural language programming, case tools, storytelling
INTRODUCTION
Copyright is held by the author/owner(s).
Workshop: Lightweight Languages 2004
LL4, December 4, 2004, MIT, Cambridge, MA
to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
IUI’04, January 13-16, 2004, Island of Madeira, Portugal.
Copyright 2004 ACM.
Computer programming is usually a harrowing experience for the uninitiated. It is difficult enough to achieve minimum proficiency in it, but to truly master programming – to attain an intuitive and almost philosophical understanding of its flow, and to reach the point of being able to easily articulate arbitrary thinkable ideas within that framework – well, few ever reach this point. Yet there is a sense that those who have absorbed programming into personal intuition have gained new tools for thinking, discovering the ability to articulate any procedural idea with algorithmic rigor.
We have developed an intelligent user interface which we hope will inspire changes to the way that computer programming is learned and practiced. Metafor is an interface for visualizing a person’s interactively typed stories as code. As a person types a story into Metafor, the system continuously understands the narrative, interpreting it programmatically using a theory of the programmatic semantics of natural language, and updating a side-by-side “visualization” of the person’s narrative as scaffolding code. The visualized scaffolding code may not be directly executable, but rather, it is meant to help a person reify her thoughts in terms of programmatic elements. We believe that Metafor is a novel system which can accomplish at least two main goals: 1) The goal of assisting novice programmers in developing intuitions about programming; and 2) The goal of facilitating intermediate programmers with system planning by serving as a brainstorming and “outlining” tool (just as writers outline ahead of writing).Entering an online community for the first time can be intimidating if a person does not understand the dynamics of the community and the attitudes and opinions espoused by its members. Right now, there seems to only be one option for these first-time entrants – to comb through the interaction logs of the community for clues about people’s personalities, attitudes, and how they would likely react to various situations. Picking up on social and personal cues, and overgeneralizing these cues into personality traits, we begin to paint a picture of a person so lucid that we seem to be able to converse with that person in our heads. Gaining understanding of the community in this manner is time consuming and difficult, especially when the community is complex. For the less dedicated, more casual community entrant, this approach would be undesirable.
[pic]
Figure 1. Virtual personas representing members of the AI community react to typed text. Each virtual persona’s affective reactions are visualized by modulating graphical elements of the icon.
In our research, we are interested in giving people at-a-glance impressions of the attitudes of people in an online community so that they can more quickly and deeply understand the personalities and dynamics of the community.
[pic]
Figure 1. Metafor’s user interface. Clockwise from the lower left corner, the four windows display the narrative being entered, the dialog history of the person-to-system interaction; an under-the-hood dump of Metafor’s current memory (for demo and debugging purposes only: not shown to beginning users); and the code visualization of the story, currently being rendered as Python code (although rendering engines can exist for any language).
1 Cultivating Intuition and Facilitating Brainstorming
This tool may help novice programmers to more rapidly develop intuitions about programming because the immediate feedback provided by the system allows the novice to focus on the mappings between their ideas naturally expressed as story, and the code which is a direct consequence of those ideas. Rather than “book-learning” programming concepts, Metafor affords novices the opportunity to learn experientially, as is often advocated by progressive education researchers. learn-by-doing; often called “experiential learning” in the educational psychology literature, many such as Kolb have touted this dynamic, explorative, generate-and-test approach as necessary to developing real intuition about a subject (Kolb, 1985).
For intermediate programmers, Metafor provides a way to create an early brainstorm or outline of a project at a very high level of description. Just as writers are accustomed to creating brainstorms and outlines before they set out on their first draft, programmers may also benefit from this phenomenon. The goal of outlining is to help an author better focus on fleshing out ideas about the task without being distracted prematurely by the imposition of the rules of form which something must obey. In programming, as in writing, it is advantageous to flesh out the details of the task before actual programming begins, because once a person is bogged in the concerns, and demands of programmatic syntax, and bugs, and the commitments of representational choice, it can be very difficult to switch back and forth to thinking about the task, or to undo representational choices already made.
The problem of getting a person to fully engage on a task without extraneous demands of form is often referred to as the engagement problem, and a well-known theory which addresses it is Csikzentmihalyi’s flow state theory (1997), which describes flow as a desirable state of deep engagement with a task putting aside all other concerns. In thinking about flow with respect to complex activities like programming, Pearce and Howard are concerned with attentional thrashing between a task and the artifacts of the tool used to accomplish that task (2004). We believe that Metafor addresses the flow concern of programming; a person is naturally engaged when he expresses the task as a story, and Metafor’s automatic creation of scaffolding code from a person’s narrative leaves a person free to focus on the high-level task without the disruptions of programmatic concerns.
2 Context of this Work
The larger context for this work is our overarching goal of enabling programming by natural language. Previously we performed some feasibility studies for programming by natural language (Lieberman & Liu, 2004a) by examining how fifth graders naturally expressed the task of programming Pacman via storytelling. We have also been exploring how natural language might inherently be interpretable under a programmatic semantics framework (Liu & Lieberman, 2004b). The current system represents our progress toward the larger goal, but we believe that Metafor’s goal of producing scaffolding code as immediate feedback to a story is compelling in and of itself for its potential applications to education and improving programming praxis.
3 Scope and Limitations
We should emphasize that Metafor cannot convert completely arbitrary English into fully specified code. Our parser cannot understand every grammatically legal English construction. And, although our parser does use a large knowledge base of common sense knowledge, discussed below, it doesn't know everything a programmer might think of saying. It's not difficult to "fool" it, and our goal is not to get 100% coverage. However, we do believe that the scope of its understanding will be sufficiently large as to be usable in practice. We are encouraged by experience with MUDs and other text-based interaction games, that achieve usable interaction even with very simple template-based parsers; with natural language interfaces to databases and search engines; and with conversational "chatbots". As we will see, we are bringing far more sophisticated analysis to the table than these systems typically use.
We, of course, will try to set user's expectations with admonitions to "keep it simple". We are assuming that the user has at least a passable reading knowledge of the programming language. Since the goal is for the user to watch how each statement affects the generated program, it is easy to spot mistakes in translation and undo them. We also provide an introspection facility, discussed in Section 4.1 Error! Reference source not found.Error! Reference source not found.4.1, that allows a user to see, for a given piece of code, what natural language expressions can be generated. This helps the user to get a feel for what the translator is capable of. We are actively exploring other ways of making the translation process fail-soft. We think that computer science has been so reluctant to use modern natural-language technology in interfaces, for fear of making mistakes, that the field has ignored important opportunities to make interfaces significantly easier for people, especially beginners, to use.
4 Paper’s Organization
The rest of this paper is structured as follows: First, we dive into an extended interaction with Metafor to give the reader a better sense for the system’s capabilities. Second, we expound on a theory of programmatic semantics for natural language that is at the core of Metafor’s interpretive abilities. Third, we briefly survey the implemented system. Fourth, we share the results of a user study we performed with non-programmers and intermediate programmers on the subject of brainstorming. Fifth we present a discussion of related works. We conclude by recapitulating the contributions of this work.
We have built a system that can automatically generate a model of a person’s attitudes and opinions from an automated analysis of a corpus of personal texts, consisting of, inter alia, weblogs, emails, webpages, instant messages, and interviews. “What Would They Think?” (Fig. 1) displays a handful of these digital personas together, each reacting to inputted text differently. The user can see visually the attitudes and disagreements of strong personalities in a community. Personas are also capable of explaining why they react as they do, by displaying some text quoted from that person when the face is clicked.
To build a digital persona, the attitudes that a person exhibits in his/her personal texts are recorded into an affective memory system. Newly presented text triggers memories from this system and forms the basis for an affective reaction. Mining attitudes from text is achieved through natural language processing and commonsense-based textual affect sensing (Liu et al., 2003). This approach to person modeling is quite novel when compared to previous work on the topic (cf. behavior modeling, e.g. (Sison & Shimura, 1998), and demographic profiling, e.g. questionnaire-derived user profiles).
A related paper on this work (Liu, 2003b) gives a more thorough technical treatment of the system for modeling human affective memory from personal texts. This paper does not dwell on the implementation-level details of the system, but rather, describes the computational model of attitudes in a more practical light, and discusses how these models are incorporated to build the intelligent user interface “What Would They Think?”.
This paper is structured as follows. First, we introduce a computational model of a person’s attitudes, a system for automatically acquiring this model from personal texts, and methods for applying this model to predict a person’s attitudes. Second, we present how a collection of digital personas can portray a community in “What Would They Think?” and an evaluation of our approach. Third, we situate our work in the literature. The paper concludes with further discussion and presents directions for future work.
COMPUTING A PERSON’S ATTITUDESAN EXTENDED INTERACTION
This section presents an extended interaction with Metafor on an example taken from the world of MUDs (multi-user dungeons) and MOOs (multi-user object-oriented), which are text-based virtual realities popular on the Internet. We chose to illustrate Metafor in this domain in particular because MOOs are themselves interactive stories, where the characters and even inanimate objects, are programmable. A typical MOO consists of text descriptions of "rooms". When a (human) player enters a room, she sees a description such as,
You are in a tiny room with a desk in the center of the room. On the desk there is a pen and a stuffed teddy bear. In the corner is a mouse hole. A mouse sticks its head out.
Users can say text or "emote", leading to dialogs like the following:
Miranda gives you a hug
Mouse says, "I'm here to hug you!"
Mouse hugs Miranda
Mouse says, “I made a mistake”
Characters in MOOs can be programmed with simple scripts, expressed in a "English-like", though formal, programming language, the following from (Bruckman, 1997).
Stacy is a frendly killer whale. She has Brown eyes and her tail has a rash.
1 script on Stacy:
on flap this number "times"
set flapped to number
if flapped > 5 times
emote " blinks her eyes happily"
endif
end
MOOs boast many avid non-programmer and beginning programmer gamers amongst their ranks, and we imagine that something like Metafor might one day (soon) be used to enable them to augment and program the virtual realities themselves, just by telling the story of how things should work.
For reasons of space, we cannot include the system agent’s dialog responses, nor can we display the contents of the under-the-hood debug window. The visualized code seen here is rendered in the syntax of the Python programming language, but in principle, renders can be written for any other language. A few peculiarities of Python benefit from explanation: “def” is the function declaration symbol, and “pass” is a placeholder keyword for a function or class with an empty body. Code which is changed or added since the previous sentence is highlighted in red.
(1) There is a bar with a bartender who makes drinks.
class bar:
the_bartender = bartender()
class bartender:
def make(drink): pass
This one sentence actually unfolds into four declarations: “There is bar,” “bar has a part called bartender,” “there is a bartender” (implied), and “bartender can make drinks.” Note that “who” is resolved as “the bartender” and not the bar because “bartender” is animate (some limited semantic knowledge is used for this).
(2) The bar has a menu containing some drinks, which include: a sour apple martini, a margarita, and rum and coke.
class bar:
the_bartender = bartender()
the_menu = menu()
class bartender:
def make(drink): pass
class menu:
drinks = [sour_apple_martini, margarita, rum_and_coke]
class drink: pass
class apple_martini(drink):
properties = [“sour”]
class margarita(drink): pass
class rum_and_coke(drink): pass
“Drinks” is recognized as a list because it is determined by “some,” and its declared contents are in list structure. Each of its member elements automatically inherit the base class “drink.” “Sour” was plucked from the object name “sour apple martini” and set instead as a property because it is the interpreter’s default policy that object names are noun phrases, and adjectival modifiers are properties.
(3) The sour apple martinis and margaritas are sweet, while rum and coke is bitter.
class bar:
the_bartender = bartender()
the_menu = menu()
class bartender:
def make(drink): pass
class menu:
drinks = [sour_apple_martini, margarita, rum_and_coke]
class drink: pass
class apple_martini(drink):
properties = [“sour”,“sweet”]
class margarita(drink):
properties = [“sweet”]
class rum_and_coke(drink):
properties = [“bitter”]
Note here how the property “sweet” distributes over “sour apple martinis” and “margaritas,” instead of forming a single object called “apple martini and margarita;” this is because the interpreter pre-recognizes existing objects because parsing.
(4) When a customer orders a drink, the bartender tries to make it. When the bartender is asked to make a drink, he makes it and gives it to the customer only if the drink is in the menu's drinks; otherwise, the bartender says "sorry i don't know how to make that drink" to the customer.
class bar:
the_bartender = bartender()
the_menu = menu()
class bartender:
def make(drink):
if (drink in menu.drinks):
bartender.make(drink)
bartender.give(drink, customer)
else:
bartender.say( \
"sorry i don't know how to make that drink", customer)
def give(drink, to_customer): pass
def say(quote, to_customer): pass
class menu:
drinks = [sour_apple_martini, margarita, rum_and_coke]
class drink: pass
class apple_martini(drink):
properties = [“sour”,“sweet”]
class margarita(drink):
properties = [“sweet”]
class rum_and_coke(drink):
properties = [“bitter”]
class customer:
def order(drink):
bartender.make(drink)
Expressions begun with “when” usually specify the scope of the following utterance. Possessives like “menu’s drinks” act just like dot notation for scope in object-oriented programming. Pronouns like “it” (inanimate, singular) and “he” (animate, male, singular) are resolved in the system’s preprocessing of discourse deixis. Although there might be other representational choices for “say,” “make,” and “give,” the interpreter’s default behavior is to treat the main verb as the predicate. Note that with additional semantic knowledge such as commonsense, there are more choices.
(5) When the customer asks the bartender to choose, the bartender makes a random sweet drink from the menu if the customer's age is under 30; or else the bartender makes a sidecar.
class bar:
the_bartender = bartender()
the_menu = menu()
class bartender:
def make(drink):
if (drink in menu.drinks):
bartender.make(drink)
bartender.give(drink, customer)
else:
bartender.say( \
"sorry i don't know how to make that drink", customer)
def give(drink, to_customer): pass
def say(quote, to_customer): pass
class menu:
drinks = [sour_apple_martini, margarita, rum_and_coke]
class drink: pass
class apple_martini(drink):
properties = [“sour”,“sweet”]
class margarita(drink):
properties = [“sweet”]
class rum_and_coke(drink):
properties = [“bitter”]
class customer:
age = None
def order(drink):
bartender.make(drink)
def ask_bartender_to_choose():
if customer.age < 30:
bartender.make(random.choice(
filter(lambda drink:
‘sweet’ in drink.properties,
menu.drinks)))
else:
bartender.make(sidecar)
Sentence (5) demonstrates the difference between the declarative and procedural contexts. While in the declarative modes of sentences (1), (2), or (3), an utterance of the form “ADJ NN” would have created “NN” as an object annotated with the “ADJ” property, there is a different handling under a procedural context (cued by a “within-function” scope and a reference to an existing object i.e. “drink”). The procedural interpretation of “ADJ NN” is “select the NN objects with the property JJ.” “Random” is an implemented primitive in the Metafor interpreter, and in Python.
Hopefully what this interaction demonstrated is that 1) natural language is particularly economical in the amount of information it contains; that 2) natural language is elegant in reusing the same constructions e.g. “ADJ NN” under different intentional contexts (e.g. procedural versus declarative) to accomplish different goals (i.e. declaration, versus relational selection); and finally 3) the ambiguity of natural language’s representational choices (e.g. “make(drink)” or “make_drink()”) is actually quite a virtue of flexibility which most popular programming languages today do not yet enjoy.
We also hope that this interaction begins to illustrate the systematicity and some of the regularities of the Metafor interpreter. More will be said on the interpreter in the presentation of the system implementation in Section 4. However, before arriving there, Section 3 describes a theory of the programmatic semantics for natural language which is driving the interpreter; this theory will synthesize together much of what was seen in the sample interaction.
A PROGRAMMATIC SEMANTICS FOR NATURAL LANGUAGE
Natural language, be it English, Chinese, or Russian, shares basic structure, and basic protocols for communication (but in this paper, we consider only English). Although the subdivision of Artificial Intelligence which tries to computationalize the understanding of natural language, called Narrative Comprehension or Story Understanding, usually represents stories using thematic role frames, Schankian scripts, Jackendoff trajectory space, or otherwise (a great review of the field given in (Mueller, 1999)), there is fundamentally no reason why natural language cannot be interpreted as if it were a programming language.
1 Basic Features
In fact, there are many reasons to believe that natural language already implies a natural programmatic interpretation. The way in which natural language tends to reify concepts as objects with properties or personify concepts as having capability begins to resemble a style of agent-programming. The natural role of nouns and noun phrases as objects (e.g. “the martini”), adjectives as properties (e.g. “sweet drinks”), non-copular verbs corresponding to functions (e.g. “make a drink”), and verb arguments as function arguments (e.g. “give the drink to the customer”) resembles the organization of object-oriented programming. Natural language also has a system of inheritance (e.g. “a martini is a drink …”), as well as conventions for reference which closely resemble dot notation (e.g. “The customer’s age” (( customer.age ). A more protracted discussion about these basic programmatic features of natural language can be found in (Liu & Lieberman, 2004b); however, the rest of this section will be dedicated to a more advanced discussion of natural language’s programmatic semantics, including the dispelling of some falsities about language, and a review of some of the elegant programmatic features of natural language which go beyond basic features of most programming languages. We cluster these discussions around a few hot topics.
2 “Ambiguity”
Mapping from natural language into programming language introduces “ambiguity” (enclosed here in scare quotes because the word is often used in a derogatory manner) which formal programming languages are often not accustomed to. While some see the inherent ambiguity of natural language as a problem, we see it as an important advantage.
Conventional programming is hard in no small part because programming languages force a programmer to make inessential decisions about representation details far too early in the design and programming process. When those early decisions later prove ill-advised, the messy and error-prone process of refactoring and other program modification techniques become necessary. By using natural language understanding to construct the mapping between natural language specifications and concrete programming language details on a dynamic basis, we retain representational flexibility for as long as it is needed. For example, consider the utterance, “sour apple martini;” there is some ambiguity in how this object should be represented and what should be parameterized. For representational simplicity, we might first reify it as “class sour_apple_martini.” However, upon later encountering a “sweet apple martini” and a “sour grape martini” and applying some background world knowledge that “sweet” and “sour” are flavors and “grape” and “apple” are kinds of fruit, we might revise the representation of “sour apple martini” to be better parameterized:
class martini:
def __init__(self,flavor=‘sour’,fruit=‘apple’):
self.flavor, self.fruit = flavor, fruit
An affordance of relating programs as stories is that we can continually reinterpret the story text as evidence crops up for better representational choices.
3 Representational Equivalence
The fact that the representation of an object can change so fluidly, yet stay consistent with the goals of the task is quite remarkable, and the sort of representational dynamism found in natural language is quite unparalleled by any formal programming language. Consider the following statements and the revision of representation which ensues.
a) There is a bar. (atom)
b) The bar contains two customers. (unimorphic list)
c) It also contains a waiter. (unimorphic wrt. persons)
d) It also contains some stools. (polymorphic list)
e) The bar opens and closes. (class / agent)
f) The bar is a kind of store. (inheritance class)
g) Some bars close at 6pm. (subclass or instantiatable)
In formal programming languages, representational revisions b) through g) are potentially quite costly, because in natural language, the revisions are quite natural. In creating a flexible representation of natural language which steers short of the representational commitments demanded by rigid programming languages, we create a representationally neutral structure for an object which is used in Metafor. It is a tuple of the form:
(full_name, arguments, body)
A type inspector examines the contents of the arguments and body, and dynamically assigns the object a type. If the constitution of the arguments or body changes, so will the type. If for example, an object’s body contains only two similarly typed elements, then it is a list. If the body also contains functions, it is a class, and so forth. The effect of this on interpreting story as code is that as new utterances reveal new details, the representation of the code will be able to adapt.
A slightly more tricky equivalence phenomenon is nominalization (turning any adjective into a noun). For example “The drink is sweet” and “The drink has sweetness” are equivalent, although in the latter, the property is talked about as an object. The way to interpret this is to assume simplicity, and only add complexity where necessary. So at first “sweetness” is just a property of “drink,” but if “sweetness” is elaborated as an object or agent (e.g. “Sweetness can hurt the stomach”) then “sweetness” becomes a part of “drink” (as do all other flavors, for symmetry).
4 Relational and Set-Theoretic Features
Many utterances in natural language perform actions akin to relational database operations, and use set-theoretic operators. For example, consider the following utterance (taken from sentence (5) of the interaction) and an interpretation for it.
The bartender makes a random sweet drink from the menu.
bartender.make(random.choice(filter(lambda drink:
‘sweet’ in drink.properties, menu.drinks)))
The phrase “a random sweet drink” is really a dynamic reference, because it points not to a static object, but rather, gives a relational specification for it (i.e. drink, random, sweet), which in turn implies a procedure to pick from a “database” of objects (i.e. find the menu, then find the drinks, then filter out only the sweet drinks, and pick one at random).
In addition, English is imbued with set-theoretic features such as the comparative (e.g. “longer,” “better,” “more”) and the superlative (e.g. “longest,” “best,” “most”) adjectives and adverbs; as well as set-facilitation determiners (e.g. “all drinks have”, “each drink has,” “some drinks … while other drinks”). A comparative allows a choice between a set of size two (e.g. “the cheaper drink”), and a superlative among a set of any size. The criteria for comparison can either be contained in the semantics of the word itself (e.g. “cheapest”) or can refer to some procedure usually contained in a complementizer phrase (e.g. “the drink which Bill would like best”). Set-facilitation determiners facilitate LISP-style processing of elements (e.g. “order all the sweet drinks”, “each drink has a price”), however, there are almost ambiguous ways to cut up sets (e.g. “most of the drinks are sweet”) which remain ambiguous, or may require additional background knowledge to disambiguate.
It is far more common to find set-manipulation procedures (e.g. map, filter, reduce, max, min) implied in natural language that it is to find explicit looping language, and in fact such procedures are composed quite elegantly in natural language, e.g.:
The customer buys all the sweet drinks under $2.
map(customer.buy,
(filter sweet_drink: sweet_drink.price < 2,
filter(lambda drink: ‘sweet’ in drink.properties,
menu.drinks))
In a study of non-programmer’s solutions to programming problems, Pane, Ratanamahatana & Myers (2001) report that people tend not to speak of iteration or looping in explicit terms. Perhaps one explanation for why this is lies in the observation that most set-manipulation procedures (e.g. “the cheapest drink on the menu”) do not demand to be evaluated immediately; rather, they are subject to lazy evaluation, at a future time, only if needed. In contrast, explicit looping language (e.g. “look at each drink and price on the menu, if the price is lower than any seen thus far, remember that drink” etc) would force the procedure to be attended to immediately, which would occupy valuable space in human short-term memory and the human discourse stack (which some have reported has only a practical depth of 2).
5 Narrative Stances and Styles
One of the affordances of stories is that they can be told through a choice of narrative stances, such as first-person, third-person, and mixed-person playwright. While it can be argued that varied styles are also present in programming languages, such as object-oriented, functional, and imperative, the difference is that narrative stances often can map to the same programmatic representation (making some basic inferences), while each style of programming has essentially married itself to a set of difficult-to-reverse representational choices.
Because different narrative stances can be used somewhat interchangeably, a story with many different narrators using different stances (or the same narrator switching between stances when convenient) could still plausibly be coherent; the same probably cannot be said of different programmers augmenting the same code with different paradigms for programming.
What grants the first-person, third-person, and mixed-person playwright stances equivalence is the deixis (meaning, contextualized pointer) of words like “I,” “him,” “here,” and “that.” Consider the following utterances:
a) I want to make a bar with a customer. (1st p. programmer)
b) There is a customer in the bar. (3rd p. narrator)
c) I am a customer sitting on a stool. (1st p. customer)
d) The bartender said, “Here is a customer” (mixed person playwright)
During interpretation of a story, a deictic stack is constantly updated, maintaining the current stance or point-of-view of the speaker, and dynamically mapping words like “I,” “him,” and “here” to their appropriate referents. In a), “I” is the programmer, so the action “make” is the same as declaring “bar” as an object with the part “customer.” In b), “there” is an existential declaration. In c), someone is speaking who declares that he is a customer, and by virtue of his current location on a stool (indicated by the progressive form, “am sitting”), and that the stool is inside the bar, we can infer by spatial transitivity that the customer is inside the bar. Finally in d), the narrator’s utterance of “the bartender said” allows us to set the speaker “I” to “bartender;” consequent from that, because the bartender is “inside the bar,” the other deictic mappings are updated too; thus, the utterance “Here is a customer” is resolved to “ is a customer.”
6 Prototypes and Background Semantics
Natural language relies heavily on background knowledge and the manipulation or augmentation of known prototypes. Rarely is something constructed de novo. Lakoff and Johnson (1980) have long suggested that language is inherently metaphorical, always building upon the scaffolding of existing knowledge; in fact, t. They assert that mathematics and physics (other than naïve physics) are precisely unintuitive precisely when they do not map metaphorically into our existing understandings.
If a programmatic interpreter of natural language hopes to be successful, it should find a way to establish some background knowledge. The most basic prototype object in a story is none other than a person; ordinary people have a great deal of knowledge regarding sociality and naïve psychology, and much of this is meant to assist us in social situations, but it is also crucial in understanding characters in stories. To take the Section 2 extended interaction for example, to start, we should be able to recognize that “bartender” and “customer” subtype “person,” and that “bar” subtypes “object” or “furniture,” etc. Knowing some of the typical abilities of all people (such as “eating” “drinking”) can help the interpreter infer abductively from the sentence, “Foo drinks something” that Foo is likely a “person” (unless we are in the realm of animals!). And it is not only objects which have prototypes defined by background semantics; functions i.e. verbs also have prototypes, and in fact, it is possible to organize verbs along hierarchies and semantic alternations classes. Levin’s Verb Classes (1993) suggests one possible organization.
Up until recently, there was a paucity of publicly-available resources which could supply the type of background semantics needed for a venture in programmatizing natural language. But ConceptNet (Liu & Singh, 2004b), a large semantic network of common sense knowledge derived from MIT’s Open Mind Common Sense project, is beginning to fill this role. In particular, ConceptNet’s relational ontology suggests it could map easily into object-oriented structures. A sampling of possible mappings is given below:
CapableOf(x,y) ( x.y(); LocationOf(x,y) ( y.x
PropertyOf(x,y) ( x.y; PartOf(x,y) ( x.y
IsA(x,y) ( class x(y); EffectOf(w.x,y.z) ( w.x(): y.z
There are many practical ways in which this background knowledge could support programmatic interpretation. For example, knowing that CapableOf(“bin”, “hold things”), we can more confidently interpret “bin” as a list or a container object. Knowing some examples of fruits, if we encounter an “apple martini” and a “pear martini,” we can generalize this to “fruit martini” where the fruit is a parameter. Just as stories are written under the assumption that the reader possesses certain basic knowledge, so should a computer reader possess a sufficient background library of prototypes.
Having at least touched upon many of the interesting programmatic semantics of natural language, the next section describes how some of this theoretical discussion is computationalized in the Metafor system interpreter.
METAFOR’S IMPLEMENTATION
In this section we present a brief overview of the Metafor implementation and then we discuss some simplifying assumptions which were made to the theory presented in Section 3.
1 Important Components
The Metafor system can broken down into the following components:
Parser – Uses the MontyLingua natural language understanding system (Liu, 2004a) to first perform a surface parse of each sentence into VSOO (verb-subject-object-object) form.
Programmatic Interpreter – First a small society of semantic recognizers mulls over the VSOO syntactic parse to identify existing objects in the code, special structures (like scoping statements, lists, quotes, if-then structure), and objects for which there exists some commonsense type information (e.g. common agents, color names, flavors, etc); second, a set of understanding demons, each capable of mapping a VSOO structure to some action or change in the code model, is run over the parsed sentences; third, the interpreter has a state tracker which maintains a deictic discourse stack, the current scope, and the current interpretive context (i.e. declarative versus procedural, as explained in Section 2) which are used by the understanding demons.
MetaforLingua -- This is the underlying knowledge representation of the code model; it is worth mentioning as a component in its own right because it is self-maintaining in that it is responsible for updating its own representation. As introduced in Section 3, all objects outside of if-then structures, have the form:
(full_name, arguments, body)
As the contents of the arguments and the body changes, the dynamic type inspector demon will assign it a different type. There is also a symmetry inspector which can propagate the implications of any change. For example, suppose that a “drink” had a part called “possible_flavors = [‘sour’,’sweet’].” Upon arriving at the statement “A drink’s sweetness hurts the stomach,” the dynamic type inspector promotes “sweet” to an object called “sweetness” with the constituent function “hurt(stomach).” Then the symmetry inspector propagates the change to promote the sister atom “sour” to an object called “sourness,” etc.
Code Renderer – Currently, only a code renderer for Python exists, but because MetaforLingua is rather generic, renderers for other languages like LISP and Java could easily for written.
Introspection – An introspection feature allows any code object or function to explain itself using generated story language when that code object is moused over. This is more or less the inverse of mapping stories to code, but can be particularly useful to a novice who has difficulty reading code.
Dialog – The system agent generates natural language dialog to relate to the user how it has interpreted her utterances, in order to offer the user transparent access to the system interpreter. The goal of dialog is also to communicate any system confusion about an ambiguous utterance, although admittedly the current implementation does not actively dialog with the user for any substantive decision making. See Figure 1 for a sample dialog.
User Interface – The user interface given in Figure 1 is still very much a work in progress. The upper-right window which current offers an under-the-hood view into the system’s internal state is not as accessible as it could be to novices. Ideally we would like to integrate a graphical visualization of the code model, or better yet, for a domain such as a MUD or MOO, the window could contain a real-time simulation of the MUD/MOO as it is updated. We would also like to add a history feature and vital features currently missing from the interface such as undo, or allowing the user to modify the visualized code.
2 Simplifying Assumptions
Metafor implements a good deal of the theoretical suggestions given in Section 3, but it also makes some simplifying assumptions in light of the realities of natural language understanding tools. However, we suggest that although a person using Metafor for brainstorming might initially have to get accustomed to the interpretive capabilities of the system (for example, choosing a meaningful verb is quite important), in further interactions a person should not feel that his or her storytelling abilities and freedoms are significantly impaired by these restrictions.
First, we assume that functions will always correspond to verbs. While this is generally true of English, there are exceptions. For example, in the sentence “When the drink is available,” the important predicate is not “be” but “be available.” In Metafor, there is a heuristic solution to this: if a verb itself is semantically too generic (e.g. “be”, “get”) then the whole verb phrase is made the function name, so “drink is available” would parse to “be_available(drink).” Also, many verbs actually span more than one word; these are idioms called phrasal verbs (e.g. “wake up,” “put away”) and they are handled to a limited extent by the natural language parser.
Second, we have to make assumptions about scoping expressions. Consider the scoping expression below (taken from Section 2, sentence (5)):
When the customer asks the bartender to choose, the bartender makes a random sweet drink from the menu if the customer's age is under 30; or else the bartender makes a sidecar.
Here, an if-then-else expression is nested within the function ask_bartender_to_choose(), because the utterance “when the customer ask the bartender to choose” is a scoping expression. However, there is ambiguity as to whether the next sentence still falls inside that function’s scope, or if we have returned to the global scope. To address this ambiguity, we make the assumption that the scope will always escape back to global following a sentence break, unless a connector phrase (e.g. “then,” “next,” “afterwards”) or sequencing phrase (e.g. “first .. second ..”) is used. There is a sense that this corresponds accurately to the role of connector and sequencing phrases in actual human communication, but admittedly, humans also have an abundance of commonsense knowledge to support each decision about scope.
A third simplifying assumption is made about function argument structure. In natural language, each verb implies a very sophisticated semantics for the arguments which follow it; however, when we map verbs and their arguments in a literal fashion to function-argument structure, much is lost. Consider this example from Section 2 sentence (4): The utterance “give the drink to the customer” maps literally into the function-argument: “give(drink,to_customer=customer);” however, what is lost in this simplification are the additional semantics of the verb give. The event “give” should have three major thematic roles for “giver,” “recipient,” and “object_given,” and giving something to someone also implies that they receive it. In our example, this would mean that “give(drink,to_customer=customer)” should actually trigger “to_customer.receive( drink, from_person = bartender).” However, the incorporation of sophisticated background knowledge into Metafor is yet to be implemented. NB, the study of frame semantics is precisely concerned with this task of cataloging verb-argument structure, as represented by Berkeley’s FrameNet project (Baker, Fillmore & Lowe, 1998), and there is hope of incorporating such a resource in the near future.
USER STUDY
We conducted a 13-person user study to gain a better sense for the potential utility of Metafor’s story-as-code visualization approach to both non-programmers and intermediate programmers. Given the complexity of performing a judicious evaluation, and the relatively small size of the study’s sampling, we prefer to view this as an indicative study, one which may prelude a fuller study not completed at submission press time.(We will include an updated study at press time; specifics here are indicative only).
The pool of vVolunteers consisted of allwere MIT undergraduates, seven7 of whom identified themselves as intermediate programmers, and six6 as non-programmers (no other types were sought for this study). Each volunteer was taken through an interview-style assessment. The question being studied was: “How does brainstorming code with Metafor affect a volunteer’s self-assessment of the difficulty of a programming task?” This is closer to testing the claim made in Section 1 about brainstorming benefiting intermediate programmers. We felt that to measure improvements in non-programmer’s intuition would have required a longer-term study and a more robust implementation. Thus, pPerhaps strangely, we asked even non-programmers how long it might take them to complete a programming task, even though they may not have the capability to accurately assess this. In light of this, we present the results as ratios of time rather than absolute times.
Each volunteer was given the same description of a programming task to program the basic high-level behaviors (excluding GUI, timing, etc) of the Pacman game. An assessment of how long the task would take them was elicited from each volunteer (baseline #1). Then, each volunteer was asked to spend two minutes writing a short story about Pacman, and then once again asked for a time-to-complete-task estimate (baseline #2). Finally, the examiner spent 5 minutes with each volunteer in front of Metafor, typing in their story into Metafor. The examiner who is familiar with the grammatical limitations of the Metafor interpreter normalized some phrasing of the volunteer’s sentences, but the volunteer was asked to object if the examiner ever introduced new information into the description. At the conclusion of the Metafor interaction, each volunteer was asked a final time for a time-to-complete-task estimate. Volunteers were also asked how likely they would be to adopt brainstorming-on-paper and brainstorming-with-Metafor on a Likert5 scale (5=very likely, 1=very unlikely). Results are given in Figures 2 and 3.
[pic]
Figure 2: The effect of brainstorming on each volunteer’s self-assessment of time-to-complete Pacman task. Times were normalized to 100 to allow comparison.
With regard to self-assessed time-to-complete the Pacman task, both non-programmers and intermediate programmers reported that prior brainstorming with Metafor had a great positive impact that brainstorming by hand. In general, non-programmers felt that brainstorming by hand didn’t bring them much closer to completing the task, and as one volunteer said, “I still wouldn’t know how to program it.” Metafor made a more substantial positive difference to non-programmers than for intermediate programmers, but both programmers and non-programmers remarked that the system was “cool,” and was eager to work with it. With regard to the results for the “adoption” question shown in Figure 3, both groups seemed unlikely to possibly about to adopt brainstorming by hand. One intermediate programmer felt that writing something down did not make progress in actualizing the code. Both group liked Metafor better, particularly because “it seems like it would be cool to play with,” and as another respondent said “I think it would be a fun way to draft a project.” Three respondents were surprised that their stories translated so directly to programs and one said he would write the story differently knowing now how the computer processes the text.
[pic]
Figure 3: The likelihood of adopting Metafor as a brainstorming step, compared against brainstorming by hand.
We are encouraged by these initial findings. What was particularly striking were the non-programmers’ enthusiasm for something which could tutor them to code, and which they could have unlimited access to, unlike a real tutor. It was also interesting that, after working with Metafor, several respondents expressed regret that they did not express their story differently; to us this suggests that after having viewed a programmatic interpretation of natural language, a user is gaining valuable reflection about her own storytelling tendencies, perhaps helping her to make her story articulation more precise or more rigorous; and. tThe important thing is that the process of learning is entertaining.
RELATED WORK
Since Metafor is meant as an aid to programming, it falls under the genre of case-tools. In the literature of case tools, two interfaces are worth mentioning. First, Tam, Maulsby, and Puerta developed a system called U-Tel (1998) which elicits a story about a task from a person, and allows the person to manually highlight and annotate words in the text with their possible roles in the code. Second, Hars & Marchewka developed a natural language caset tool (1996) which maps expert-system rules, stated in English, into a yes/no decision flowchart whose nodes are large unparsed natural language utterances. Our approach differs from Tam et al.’s approach in that Metafor tries to automatically interpret the user’s story. Non-programmers who do not understand the specifications of natural language may not be able to manually annotate code; also the fact that almost every utterance in a story has some actionable consequence in Metafor may bring some fundamental assumptions to the foreground which in a system like U-Tel might simply be passed over, or it may have the effect of encouraging more precise articulations to be made.
Metafor and Hars & Marchewka’s system share a common goal of helping to automatically visualize a natural language story; however, Hars & Marchewka’s system seems mainly capable of understanding if-then-else structure, whereas Metafor is capable of visualizing further structure.
CONCLUSION
Metafor is an intelligent user interface which encourages a person to tell a story about a made-up world, or to describe a task like Pacman in plain English. All the while, Metafor constantly tries to understand each utterance through a programmatic interpreter, whose implementation derives from a theory we are developing of the programmatic semantics of natural language. Metafor visualizes each story utterance as changes and additions to some Python code representation of the story.
To be sure, this code visualization is not complete; it is merely scaffolding code which reifies many of the user’s high-level descriptions of behavior. Also to be sure, Metafor’s natural language parser and programmatic interpreter will not be able to correctly interpret any arbitrarily complex English utterance (that would imply that we have solved the deep story understanding problem, which we have not). However, English has a structural regularity (it is a SVO subject-verb-object language), Metafor will usually produce some interpretation. Metafor is very transparent about what it best understands and worst understands, as a system agent explains in plain English what it thought the user asked. From the comments of those we have played with Metafor, wWe believe that a person could quickly adapt to producing utterances which would be well-understood by Metafor’s interpreter, yet inherently this adaptation does not limit what’s thinkable because it does not limit what can be said, it is only picky about phrasing.
We are encouraged by the results of a user study for Metafor, which showed that both non-programmers and intermediate programmers found the system to be an effective brainstorming and project planning tool, more so that simply writing a story down. They also found it fun and engaging, and would consider using it. One of the more striking reactions from some of the non-programmer participants was that Metafor was like a programming tutor which they might have unlimited access to. In the end, participants found that Metafor is fun, and fun cannot be emphasized enough; it provides programming novices a fun way to gain intuition for programming by allowing them to tinker with something which is responsive to anything they type. If we could make programming more like a toy that even younger kids could play with, our hearts can only imagine the educational implications.
.Our approach to modeling attitudes is based on the analysis of personal texts using natural language parsing and the commonsense-based textual affect sensing work described in (Liu et al., 2003). Personal texts are broken down into units of affective memory, consisting of concepts, situations, and “episodes”, coupled with their emotional value in the text. The whole attitudes model can be seen as an affective memory system that valuates the affect of newly presented concepts, situations, and episodes by the affective memories they trigger.
In this section, we first present a bipartite model of the affective memory system. Second, we describe how such a model is acquired automatically from personal texts. Third, we discuss methods for applying the model to predict a user’s affective reaction to new texts. Fourth, we describe how some advanced features enrich our basic person modeling approach.
1 A Bipartite Affective Memory System
A person’s affective reaction to a concept, topic, or situation can be thought of as either instinctive, due to attitudes and opinions conditioned over time, or reasoned, due to the effect of a particularly vivid recalled memory. Borrowing from cognitive models of human memory function, attitudes that are conditioned over time can be best seen as a reflexive memory, while attitudes resulting from the recall of a past event can be represented as a long-term episodic memory (LTEM). Memory psychologist Endel Tulving equates LTEM with “remembering” and reflexive memory with “knowing” and describes their functions as complementary (Tulving, 1983). We combine the strengths of these two types of memories to form a bipartite, episode-reflex model of the affective memory system.
1 Affective long-term episodic memory
Long-term episodic memory (LTEM) is a relatively stable memory capturing significant experiences and events. The basic unit of memory captures a coherent series of sequential events, and is known as an episode. Episodes are content-addressable, meaning, that they can be retrieved through a variety of cues encoded in the episode, such as a person, location, or action. LTEM can be powerful because even events that happen only once can become salient memories and serve to recurrently influence a person’s future thinking. In modeling attitudes, we must account for the influence of these particularly powerful one-time events.
In our affective memory system, we compute an affective LTEM as an episode frame, coupled with an affect valence score that best characterizes that episode. In Fig. 2, we show an episode frame for the following example episode: “John and I were at the park. John was eating an ice cream. I asked him for a taste but he refused. I thought he was selfish for doing that.”
Figure 2. An episode frame in affective LTEM.
As illustrated in Fig. 2, An episode frame decomposes the text of an identified episode into simple verb-subject-argument propositions like (eat John “ice cream”). Together, these constitute the subevents of the episode. The moral of an episode is important because the episode-affect can be most directly attributed to it. Extraction of the moral, or root cause, is done through heuristics which are discussed elsewhere (Liu, 2003b). Tulving’s encoding specificity hypothesis (1983) suggests that contexts such as date, location, and topic are useful to record because an episode is more likely to be triggered when current conditions match the encoding conditions. The affect valence score is a numeric triple representing (pleasure, arousal, dominance). This will be covered in more detail later in the paper.
2 Affective reflexive memory
While long-term episodic memory deals in salient, one-time events and must generally be consciously recalled, reflexive memory is full of automatic, instant, almost instinctive associations. Whereas LTEM is content-addressable and requires pattern-matching the current situation with that of the episode, reflexive memory is like a simple lookup-table that directly associates a cue with a reaction, thereby abstracting away the content. In humans, reflexive memories are generally formed through repeated exposures rather than one-time events, though subsequent exposures may simply be recalls of a particularly strong primary exposure (Locke, 1689). In addition to frequency of exposures, the strength of an experience is also considered. Complementing the event-specific affective LTEM with an event-independent affective reflexive memory makes sense because there may not always be an appropriate distinct episode which shapes our appraisal of a situation; often, we react reflexively – our present attitudes deriving from an amalgamation of our past experiences now collapsed into something instinctive.
Because humans undergo forgetting, belief revision, and theory change, update policies for human reflexive memory may actually be quite complex. In our computational, we adopt a more simplistic representation and update policy that is not cognitively motivated, but instead, exploits the ability of a computer system to compute an affect valence at runtime.
The affective reflexive memory is represented by a lookup-table. The lookup-keys are simple concepts which can be semantically recognized as a person, action, object, activity, or named event. These keys act as the simple linguistic cues that can trigger the recall of some affect. Associated with each key is a list of exposures, where each exposure represents a distinct instance of that concept appearing in the personal texts. An exposure, E, is represented by the triple: (date, affect valence score V, saliency S). At runtime, the affect valence score associated with a given conceptual cue can be computed using the formula given in Eq. (1).
[pic] (1)
where n = the number of exposures of the concept
This formula returns the valence of a conceptual cue averaged over a particular time period. The term, [pic], rewards frequency of exposures, while the term, [pic], rewards the saliency of an exposure. In this simple model of an affective reflexive memory, we do not consider phenomena such as belief revision, reflexes conditioned over contexts, or forgetting.
To give an example of how affective reflexive memories are acquired from personal texts, consider Fig. 3, which shows two excerpts of text from a weblog and a snapshot sketch of a portion of the resulting reflexive memory.
Figure 3. How reflexive memories get recorded from excerpts.
In the above example, two text excerpts are processed with textual affect sensing and concepts, both simple (e.g. telemarketer, dinner, phone), and compound (e.g. telemarketer::call, interrupt::dinner, phone::ring) are extracted. The saliency of each exposure is determined by heuristics such as the degree to which a particular concept in topicalized in a paragraph. The resulting reflexive memory can be queried using Eq. (1). Note that while a query on 3 Oct 01 for “telemarketer” returns an affect valence score of (-.15, .25, .1), a query on 5 Oct 01 for the same concept returns a score of (-.24, .29, .11). Recalling that the valence scores correspond to (pleasure, arousal, dominance), we can interpret the second annoying intrusion of a telemarketer’s call as having conditioned a further displeasure and a further arousal to the word “telemarketer”.
Of course, concepts like “phone” and “dinner” also unintentionally inherit some negative affect, though with dinner, that negative affect is not as substantial because the saliency of the exposure is lower than with “telemarketer.” (“dinner” is not so much the topic of that episode as “telemarketer”). Also, if successive exposures of “phone” are affectively ambiguous (sometimes used positively, other times negatively), Eq. (1) tends to cancel out inconsistent affect valence scores, resulting in a more neutral valence.
In summary, we have motivated and characterized the two components of the affective memory system: an episodic component emphasizing the affect of one-time salient memories, and a reflexive component, emphasizing instinctive reactions to conceptual cues that are conditioned over time. In the following subsection, we propose how this bipartite affective memory system can be acquired automatically from personal texts.
2 Model Acquisition from Personal Texts
The bipartite model of the affective memory system presented above can be acquired automatically from an analysis of a corpus of personal texts. Fig. 4 illustrates the model acquisition architecture. [pic]
Figure 4. An architecture for acquiring the affective memory system from personal texts.
Though there are some challenging tasks in the natural language extraction of episodes and concepts, such as the heuristic extraction of episode frames, these details are discussed elsewhere (Liu, 2003b). In this subsection, we focus on three aspects of model acquisition, namely, establishing the suitability criteria for personal texts, choosing an affective representation of attitudes, and assessing the affective valence of episodes and concepts.
1 What Personal Texts are Suitable?
In deciding the suitability of personal texts, it’s important to keep in mind that we want a text that is both a rich source of opinion, and also amenable to natural language processing by the computer. First, texts should be first-person, opinion narratives. It is still rather difficult to extract a person’s attitudes given a non-autobiographical text because the natural language processing system would have to robustly decide which opinions belong to which persons (we save this for future work). It is also important that the text be of a personal nature, relating personal experiences or opinions. Attitudes and opinions are not easily accessible in third-person texts or objective writing, especially for a rather naïve computer reading program. Second, texts should explore a sufficient breadth of topics to be interesting. An insufficiently broad model gives a poor and disproportional sampling of a person and would hardly justify the embodiment of such a model into a digital persona. It should be noted however, that there is plausible reason to intentionally partition a person’s text corpus into two or more digital personas. Perhaps it would be interesting to contrast an old Marvin Minsky versus a young one, or a Marvin who is passionate about music versus a Marvin who is passionate about A.I. Third, texts should cover everyday events, situations, and topics, because that is the optimal discourse domain of recognition of the mechanism with which we will judge the affect of text. Fourth, texts should ideally be organized into episodes, occurring over a substantial period of time relative to the length of a person’s life. This is a softer requirement because it is still possible to build a reflexive memory without episode partitioning. Weblogs are an ideal input source because of their episodic organization, although instant messages, newsgroups, and interview transcripts are also good input sources because they are so often rich in opinion.
Representing Affect using the PAD Model
Affect valence pervading the proposed models can take one of two potential representations. They take an atomistic view that emotions existing as a part of some finite repertoire, as exemplified by Manfred Clyne’s “sentics” schema (1977). Or, they can take the form of a dimensional model, represented prominently by Albert Mehrabian’s Pleasure-Arousal-Dominance (PAD) model (1995). In this model, the three nearly independent dimensions are Pleasure-Displeasure (i.e., feeling happy or unhappy), Arousal-Nonarousal (i.e., arousing one’s attention), and Dominance-Submissiveness (i.e., the amount of confidence/lack-of-confidence felt). Each dimension can assume values from –100% to +100%, and a PAD valence score is a 3-tuple of these values (e.g. [-.51, .59, .25] might represent anger).
We chose a dimensional model, namely, Mehrabian’s PAD model, over the discrete canonical emotion model because PAD represents a sub-symbolic, continuous account of affect, where different symbolic affects can be unified along one of the three dimensions. This model has robustness implications for the affective classification of text. For example, in the affective reflexive memory, a conceptual cue may be variously associated with anger, fear, and surprise, which can be unified along the Arousal dimension of the PAD model, thus enabling the affect association to be coherent and focused.
3 Affective Appraisal of Personal Text
Judging the affect of a personal text has three chief considerations. First, the mechanism for judging the affect should be robust and comprehensive enough to correctly appraise the affect of a breadth of concepts. Second, to aid in the determination of saliency, the mechanism must be able to appraise the affect of very little text, such as on the sentence-level. Third, the mechanism should recognize specific emotions rather than convolving affect onto any single dimension.
Several common approaches fail to meet the criteria. The naïve keyword spotting approach looks for surface language features like keywords. However, this approach is not acceptably robust on its own because affect is often conveyed without mood keywords. Statistical affect classification using statistical learning models such as latent semantic analysis (Deerwester et al., 1990) generally require large inputs for acceptable accuracy because it is a semantically weak method. Hand-crafted models and rules are not broad enough to analyze the desired breadth of phenomena.
To analyze personal text with the desired robustness, granularity, and specificity, we employ a model of textual affect sensing using real-world knowledge, proposed by Liu et al. (2003). In this model, defeasible knowledge of everyday people, things, places, events, and situations is leveraged to sense the affect of a text by evaluating the affective implications of each event or situation. For example, to evaluate the affect of “I got fired today,” this model evaluates the consequences of this situation and characterizes it using negative emotions such as fear, sadness, and anger. This model, coupled with a naïve keyword spotting approach, provides rather comprehensive and robust affective classification. Since the model uses knowledge rather than word statistics, it is semantically strong enough to evaluate text on the sentence level, classifying each sentence into a six-tuple of valences (ranging from a value of 0.0 to 1.0) for each of the six basic Ekman emotions of happy, sad, angry, surprised, fearful, and disgusted (an atomistic view of emotions) (Ekman, 1993). These emotions are then mapped to the PAD model.
One point of potential paradox should be addressed. The real-world knowledge-based model of affect sensing is based on defeasible commonsense knowledge from the Open Mind Commonsense corpus (Singh et al., 2002), which is in turn, gathered from a web community of some 11,000 teachers. Therefore, the affective assessment of text made by such a model represents the judgment of a typical person. However, sometimes a personal judgment of affect is contradicted by the typical judgment. Thus, it would seem paradoxical to attempt to learn that a situation has a personally negative affect when the typical person judges the situation as positive. To overcome this difficulty, we implement, in parallel, a mood keyword-spotting affect sensing mechanism to confirm and contradict the assessment of the primary model. In addition, we make the assumption that although a personal affect judgment may deviate from that of a typical person on small particulars, it will not deviate on average, when examining a large text. The implication of this is that on a slightly larger granularity than a sentence, the affective appraisal is more likely to be accurate. In fact, accuracy should increase proportional to the size of the textual context being considered. The evaluation of Liu et al.’s affective navigation system (2003b) yields some indirect support for the idea that accuracy increases with the size of the textual context. In that user study, users found affective categorizations of textual units on the order of chapters to be more accurate and useful to information navigation than affective categorizations of small textual units such as paragraphs.
To assess the affect of a sentence, we factor in the affective assessment of not only the sentence itself, but also of the paragraph, section, and whole journal entry or episode. Because so much context is factored into the affect judgment, only a modest amount of affective information can learned for any given sentence. Thus we rely on the confirming effects of being able to encounter an attitude multiple times. In exchange for only being able to learn a modest amount from a sentence, we also minimize the impact of erroneous judgments.
In summary, digital personas can be automatically acquired from personal texts. These texts should feature the explicit expression of the opinions of the person to be modeled, and should be of a certain form required by the natural language processing. Natural language processed texts are analyzed for its affective content at varying textual granularities (e.g. sentence-, paragraph-, and section- level) so as to minimize the possibility of error. This is necessary because our textual affect sensing tool evaluates a typical person’s affective reaction to a text, and not any particular person’s. Affect valence is represented using the PAD dimensional model of affect, whose continuity allows affect valences to be more easily summed together. The resulting affect valence is recorded with a concept in the reflexive memory, and an episode in the episodic memory.
3 Predicting Attitudes using the Model
Having acquired the model, the digital persona attempts to predict the attitudes of the person being modeled by offering some affective reaction when it is fed some new text. This reaction is based on how the new text triggers the reflex concepts and the recall of episodes in the affective memory system. When a reflex memory or episode is triggered, the affective valence score associated with that memory gets attached to the affective context of the new text. The gestalt reaction to the new text is a weighted summation of the affect valence scores of the triggered memories.
The triggering process is somewhat complex. The triggering of episodes requires the detection of an episode in the new text, and heuristically pattern matching this new episode frame to the library of episode frames. The range of concepts that can trigger a reflex memory is increased by the addition of conceptual analogy using OMCSNet, a semantic network of commonsense knowledge. The details of the triggering process is omitted here, but is discussed elsewhere (Liu, 2003b).
This process of valuating some new text by triggering memories out of the context in which they were encoded, and inheriting their affect valences, is error prone. We rely on the observation that if many memories are triggered, their contextual intersection is more likely to be accurate. Ultimately, the performance of the digital persona in reproducing the attitudes of the person being model is determined by the breadth and quality of the corpus of personal texts gathered on the person. The digital persona cannot predict attitudes that are not explicitly exhibited in the personal texts.
4 Enriching the Basic Model
The basic model of a person’s attitudes focuses on applying a person’s self-described memories to valuate new textual episodes. While this basic model is sufficient to produce reactions to text for which there exists some relevant personal memories, the generated digital personas are often quite “sparse” in what they can react to. We have proposed and evaluated some advancements to the basic model. In particular, we have looked at how a person’s attitude model can be enriched by the attitude models of people with whom the modeled person fashions himself/herself after – perhaps a good friend or mentor. More technically, we mean an imprimer.
Marvin Minsky describes an imprimer as someone to which one becomes attached. (Minsky, forthcoming) He introduces the concept in the context of attachment-learning of goals, and suggests that imprimers help to shape a child’s values. Imprimers can be a parent, mentor, cartoon character, a cult, or a person-type. The two most important criteria for an imprimer are that 1) the imprimer embodies some image, filled with goals, ideas, or intentions, and that 2) one feels attachment to the imprimer.
We extend this idea in the affect realm and make the further claim that internal imprimers can do more than to critique our goals; our attachment to them leads us to the willful emulation of a portion of their values and attitudes. Keeping a collection of these internal imprimers, they help to support our identity. From the supposition that we conform to many of the attitudes of our internal imprimers, we hypothesize that affective memory models of these imprimers, if known, can complement the person’s own affective memory model in helping to predict a person’s attitudes. This hypothesis is supported by much of the work in psychoanalysis. Sigmund Freud (1991) wrote of a process he called introjection, in which children unconsciously emulate aspects of their parents, such as the assumption of their parent’s personalities and values. Other psychologists have referred to introjection by terms like identification, internalization, and incorporation.
We propose the following model of internal imprimers to support attitude prediction. First, it is necessary to identify people, groups, and images that may possibly be a person’s imprimer. We can do so but analyzing the affective memory. From a list of all conceptual cues from both the episodic and reflexive memories, we use semantic recognizers to identify all people, groups (e.g. “my company”) and images (e.g. “dog”=> “dog-person”) that on average, elicit high Arousal and high Submissiveness, show high frequency of exposure in the reflexive memory, and collocate in past episodes with self-conscious emotion keywords like “proud”, “embarrassed”, “ashamed”.
[pic]
Figure 5. Affective models of internal imprimers, organized into personas, complements one’s own affective model
Once imprimers are identified, we also wish to identify the context under which an imprimer’s attitudes show influence. Shown in Fig. 5, we propose organizing the internal imprimer space into personas representing different contextual realms. There is good reason to believe that humans organize imprimers by persona because we are different people for different reasons. One might like Warren Buffett’s ideas about business but probably not about cooking. Personas can also prevent internal conflicts but allowing a person to maintain separate systems of attitudes in different contexts. To identify an imprimer’s context, we must first agree on an ontology of personas, which can be person-general (as the personas in Fig. 5 are) or person-specific. Once imprimers are associated with personae, we gather as much “personal” text from each imprimer as desired and acquire only the reflexive memory model, thus relaxing the constraint that texts have episodic organization. In this augmented attitude prediction strategy (depicted in Fig. 3), when conceptual cues are unfamiliar to the self, we identify internal imprimers whose persona matches the genre of the new episode, and give them an opportunity to react to the cue. These affective reactions are multiplied by a coefficient representing the ability of this self to be influenced, and the valence score is added on to the episode. Rather than maintaining all attitudes in the self, internal imprimers enable judgments about certain things to be mentally outsourced to the persona-appropriate imprimers.
We have implemented and evaluated the automated identification and modeling acquisition of imprimer personas in cases where the imprimers are people. Our implemented system is not yet able to use abstract non-person imprimers, e.g. “dog-person”.
[pic]
Figure 6. The imprimer-augmented attitude prediction strategy. Edges represent memory triggers.
In summary, we have presented a reflex-episode model of affective memory as a memory-based representation of a person’s attitudes. The model can be acquired automatically from personal text using natural language processing and textual affect analysis. The model can be applied over new textual episodes to produce affective reactions that aim to emulate the actual reactions of the person being modeled. (Fig. 6). We have also discussed how the basic attitudes model can be enriched with added information about the attitudes of the mentors of the person being modeled.
In the following section, we abstract away the details of the attitudes model presented in this section to examine how digital personas can be portrayed graphically and how a collection of digital personas can portray the personalities of a community.
WHAT WOULD THEY THINK?
While modeling a person’s attitudes is fun in the abstract, it lacks the motivation and the verifiability of a real application of the theory and technology. What Would They Think? (Fig. 1) is a graphical realization of the modeling theory discussed in the previous section. What Would They Think? has been implemented and is currently being evaluated through user studies, though the underlying attitude models have already been evaluated in a separate study. In this section, we discuss the design of our interface, present some scenarios for its use, and report how this work has been evaluated.
1 Interface Design
Digital personas acquired from an automatic analysis of personal text, are represented visually with pictures of faces, which occupy a matrix. Given some new text typed or spoken into the “fodder” box, each persona expresses an affective reaction through modulations in the graphical elements of the face icon. Each digital persona is also capable of some introspection. When clicked, a face can explain what motivated its reaction by displaying a salient quote from its personal text.
Why a static face? Visualizing a digital persona’s attitudes and reactions with the face of the person being represented is better than with something textual or abstract. There are several reasons why a face is a superior representation. People are already wired with a cognitive faculty for quickly recognizing and remembering faces, and a face acts as a unique cognitive container for a person’s individual identity and personality. In the user task of understanding a person’s personality, it is easier to attribute personality traits and attitudes to a face than to text or an abstract graphic. For example, people-watching is a past-time in which we imagine the personality and identity behind a stranger’s face (Whyte, 1988). A community of faces is more socially evocative than either a community of textual labels or abstract representations, for those representations are not designed as convenient containers of identity and personality.
Having decided on a face representation, should the face be abstract or real, static or animated? While verisimilitude is the goal for many facial interfaces, we must be careful to not portray more detail in the face than our attitude model is capable of elucidating, for the face is fraught with social cues, and unjustified cues could do more harm than good. By conveying attitudes through modulations in the graphical elements of a static face image, rather than through modulations of expression and gaze in an animated face, we are emphasizing the representational aspect of the face, over the real. Scott McCloud has explored extensively the representational-vs.-real tradeoff of face drawing in comics (1993).
Modulating the Face. In the expression of an affective reaction, it is nice to be able to preserve the detail of the continuous, dimensional output of the digital persona. The information should also be conveyed as intuitively as possible. Thus an intuitive mapping may be best achieved through the use of visual metaphors to represent affective states of the person (Lakoff & Johnson, 1980). We often describe a happy person as being “colorful”, while “face turns colorless” usually represents negative emotions like fear and melancholy. A person whose attention or passion is aroused has a face that “lights up”. And someone who isn’t sure or confident about a topic feels “fuzzy” toward it. Taking these metaphors into consideration, a rather straightforward scheme is used to map the three affect dimensions of pleasure, arousal, and dominance onto the three graphical dimensions of color saturation, brightness, and focus, respectively. A pleasurable reaction is manifested by a face with high color saturation, while a displeasurable reaction maps to an unsaturated, colorless face. This mapping creates an implicit constraint that the face icon be in color. An aroused reaction results in a brightly lit icon, while a non-aroused reaction results in a dimly lit icon. A dominant (confident) reaction maps to a sharp, crisp image, while a submissive (unconfident) reaction maps to a blurry, unfocused image. While better mapping schemes may exist, our experience with users who have worked with this interface tells us that the current scheme conveys the affect reaction quite intuitively. This makes the assumption that the original face icons are all of good quality – in color, bright enough, and in focus.
Populating a Community. An n x n matrix can hold a small collection of digital personas. The matrix can either be configured automatically or manually. Each matrix cell can be manually configured to house a digital persona by specifying a persona .MIND file and a face icon. A user can build and later augment a digital persona by specifying a weblog url, homepage url, or some personal text pasted into the window. The matrix can also be configured automatically to represent a community. Plug-in scripts have been created to automatically populate the matrix with certain types of communities, including a niche community of weblogs known as a “blog ring,” a circle of friends in the online networking community called “,” a group of potential mates on an online dating website called “,” and a usenet community.
Currently, only a blog ring community can generate fully specified digital personas. The Friendster and communities’ personal text corpora are rather small profile texts. As a result, only a fairly shallow reflexive memory can be built. The episodic memory is not meaningful for these texts. The personal texts for usenet communities are rather inconsistent in quality. For example, a usenet community based on question and answers will not be as good a source of explicit opinions as a community based on discussion of issues. Also, usenet communities pose the problem of not providing a face icon for each user. In this case, the text of each person’s name labels each matrix cell, accompanied by a default face icon in the background, which is necessary to convey the affective reaction.
Introspection. A digital persona is capable of some limited introspection. To inquire what motivated a persona to express a certain reaction to some text, the face icon can be clicked. An explanation will be offered taking the form of a quote or a series of quotes from the personal text. These quotes are generated by backpointers to the text associated with each affective memory. For episodic memory, a particularly salient episode can justify a reaction, while there may need to be many quotes to justify a triggered reflex memory. With the capability for some introspection and explanation, a user can verify whether or not an affective reaction is indeed justified. This lends the interface some fail-softness, as a user will not be completely mislead when a person’s attitude is erroneously represented by the system.
2 Use Cases
How can a person use the What Would They Think? interface to understand the personalities and attitudes of people in a community? The system supports several use cases.
In the basic use case, the user, a new entrant to a community, is presented with an automatically generated matrix of some people in the community. The user can employ a hypothesis-testing approach to understanding personalities. The user types some very opinionated statements into the “fodder” box as a litmus test in understanding the attitudes of the different people toward that statement. Faces that lighting up in color versus black and white provide an illustrative contrast of the strong disagreements in the community. A user can inquire as to the source of strong opinions by clicking on a face and viewing a motivating quote. A user can reorganize the matrix so as to cluster personalities perceived to be similar. Assuming that the personal texts for each persona in the community is of comparable length, depth, and quality, the user may notice over a series of interactions that certain personas are negative more often than not, or certain other personas are aroused more intensely more often than other personas. These may lead a user to conclude that certain personalities are more cynical, and others more easily excitable.
Another use case is gaging the interests and expertise of people in a community. Because people generally talk more about things that interest them and have more to say on topics they are more familiar with, a digital persona modeled on such texts will necessarily exhibit more reaction to texts that are interesting to the person being or falls in their area of expertise. In this use case, a user can, for example, copy-and-paste a news article into the fodder box and assess which personas are interested or have expertise toward a particular topic.
A third use case involves community-assisted reading. The matrix fodder box can be linked to a cursor position in a text file browser. As a user reads through a webpage, story, or news article, he/she can get a sense of how the community might read and react to the text currently being read.
3 Evaluation
The quality of the attitude prediction in What Would They Think? has been formally evaluated through user studies. We are also currently conducting user studies to evaluate the effectiveness of the matrix interface in assisting a person to learn about and understand a community. These results will be available by press time.
The quality of attitude prediction was evaluated experimentally, working with four subjects. Subjects were between the ages of 18 and 28, and have kept diary-style weblogs for at least 2 years, with an average entry interval of three-to-four days. Subjects submitted their weblog urls, for the generation of affective memory models. An imprimer identification routine was run, and the examiner hand-picked the top one imprimer for each of the three persona domains implemented: social, business, and domestic. A personal text corpus was built, and imprimer reflexive memory models were generated. The subjects were engaged in an interview-style experiment with the examiner.
In the interview, subject and their corresponding PERSONA models were asked to evaluate 12 short texts representative of three genres: social, business, and domestic (corresponding to the ontology of personas in the tested implementation). The same set of texts was presented to each participant and the examiner chose texts that were generally evocative. They were asked to summarize their reaction by rating three factors on Likert-5 scales.
Feel negative about it (1)…. Feel positive about it (5)
• Feel indifferent about it (1) … Feel intensely about it (5)
Don’t feel control over it (1)… Feel control over it (5)
These factors are mapped onto the PAD valence format, assuming the following correspondence: 1(-1.0, 2( -0.5, 3(0.0, 4( +0.5, and 5( +1.0. Subjects’ responses were not normalized. To assess the quality of attitude prediction, we record the spread between the human-assessed and computer-assessed valences,
[pic] (2)
We computed the mean spread and standard deviation across all episodes along each PAD dimension. On the –1.0 to +1.0 valence scale, the maximum spread is 2.0. Table 1 summarizes the results.
Table 1. Performance of attitude prediction, measured as the spread between human and computer judged values.
| |Pleasure |Arousal |Dominance |
| |mean |std. |mean |std. |mean |std. |
| |spread |dev. |spread |dev. |spread |dev. |
|SUBJECT 1 |0.39 |0.38 |0.27 |0.24 |0.44 |0.35 |
|SUBJECT 2 |0.42 |0.47 |0.21 |0.23 |0.48 |0.31 |
|SUBJECT 3 |0.22 |0.21 |0.16 |0.14 |0.38 |0.38 |
|SUBJECT 4 |0.38 |0.33 |0.22 |0.20 |0.41 |0.32 |
Assuming that human reactions obeyed a uniform distribution over the Likert-5 scale, we give two baselines, which were simulated over 100,000 trials. In BASELINE 1, [pic] is fixed at 0.0 (neutral reaction to all text). In BASELINE 2, [pic] is given a random value over the interval [-1.0,1.0] with a uniform distribution (arbitrary reaction to all text). It should be pointed out however, that in the context of an interactive sociable computer, BASELINE 1 is not a fair comparison, because it would never produce any behavior.
On average, our approach performed noticeably better than both baselines, excelling particularly in predicting arousal, and having the most difficulty predicting dominance. The standard deviations were very high, reflecting the observation that predictions were often either very close to the actual valence, or very far. This can be attributed to one of several causes. First, multiple episodes described in the same journal entries may have caused the wrong associations to be learned. Second, the reflexive memory model does not account for conflicting word senses. Third, personal texts inputted for the imprimers often generated models skewed to positive or negative because text did not always have an episodic organization. While results along the pleasure and dominance dimensions are weaker, the arousal dimension recorded a mean spread of 0.22, suggesting the possibility that it alone may have immediate applicability.
Table 2. Performance of attitude prediction that can be attributed to imprimers and episodic memory
In the experiment, we also analyzed how often the episodic memory, reflexive memory, and imprimers were triggered. Episodes were on average, 4 sentences long. For each episode, reflexive memory was triggered an average of 21.5 times, episodic memory 0.8 times, and imprimer reflexive memory 4.2 times. To measure the effect of imprimers and episodic memories, we re-ran the experiment turning off imprimers only, episodic memory only, and both. Table 2 summarizes the results.
These results suggest that the positive effect of episodic memory was negligible on the results. This certainly has to do with its low rate of triggering, and the fact that episodic memories were weighted only slightly more than reflexive memories. The low trigger rate of episodic memory can also be attributed to the strict criteria that three conceptual cues in an episode frame must trigger in order for the whole episode to trigger. These results also suggest that imprimers played a measurable role in improving performance, which is a very promising result.
Overall, the evaluation demonstrates that the proposed attitude prediction approach is promising, but needs further refinement. The randomized BASELINE 2 is a good comparison when considering possible entertainment applications, whose interaction is more fail-soft. The approach does quite well against the active BASELINE 2, and is within the performance range of these applications. Taking into account possible erroneous reactions, we were careful to pose What Would They Think? as a fail-soft interface. The reacting faces are evocative, and encourage the user to click on a face for further explanation. Used in this manner, the application is fail-soft because users can decide on the basis of the explanations whether the reaction is justified or mistaken. We expect that ongoing studies of the usefulness of the What Would They Think? intelligent will show that its use is fail-soft: the generated reactions are evocative and encourage the user to further verify and investigate a purported attitude. We do not suggest that the approach is yet ready for fail-hard applications, such as deployment as a sociable software agent, because fallout (bad predictions) can be very costly in the realm of affective communication (Nass et al., 1994).
RELATED WORK
The community of personalities metaphor has been previously explored with Guides (Oren et al., 1990), a multi-character interface that assisted users in browsing a hypermedia database. Each guide embodied a specific character (e.g. preacher, miner, settler) with a unique “life story.” Presented with the current document that a user is browsing, each guide suggested a recommended follow-up document, motivated by the guide’s own point-of-view. Each guide’s recommendations is based on a manually constructed bag of “interests” keywords.
Our affective memory -based approach to modeling a person’s attitudes appears to be unique in the literature. Existing approaches to person modeling are of two kinds: behavior modeling, and demographic profiling. The former approach models the actions that users take within the context of an application domain. For example, intelligent tutoring systems track a person’s test performance (Sison & Shimura, 1998), while online bookstores track user purchasing and browsing habits and combine this with collaborative filtering to group similar users (Shardanand & Maes, 1995). The latter approach uses gathered demographic information about a user, such as a “user profile”, to draw generalized conclusions about user preferences and behavior.
Neither of the existing approaches are appropriate to the modeling of “digital personas.” In behavior modeling, knowledge of user action sequences are generally only meaningful in the context of a particular application and does not significantly contribute to a picture of a person’s attitudes and opinions. Demographic profiling tends to overgeneralize people by the categories they fit into, is not motivated by personal experience, and often requires additional user action such as filling out a user profile.
Memory-based modeling approaches have also been tried in related work on assistive agents. Brad Rhode’s Remembrance Agent (Rhodes & Starner, 1996) uses an associative memory to proactively suggest relevant information. Sunil Vemuri’s project, “What Was I Thinking?” (2004) is a memory prosthesis that records audio from a wearable device, and intelligently segments the audio into episodes, allowing the “audio memory” to be more easily browsed.
CONCLUSION
Learning about the personalities and dynamics of online communities has been up to now a difficult problem with no good technological solutions. In this paper, we propose What Would They Think? an interactive visual representation of the personalities in a community. A matrix of digital personas reacts visually to what a user types or says to the interface, based on predictions of attitudes actually held by the persons being modeled. Each digital persona’s model of attitudes is generated automatically from an analysis of some personal text (e.g. weblog), using natural language processing and textual affect sensing to populate an associative affective memory system. The whole application enables a person to understand the personalities in a community through interaction rather than by reading narratives. Patterns of reactions observed over a history of interactions can illustrate qualities of a person’s personality (e.g. negativity, excitability), interests and expertise, and also qualities about the social dynamics in a community, such as the consenses and disagreements held by a group of individuals.
The automated, memory-based personality modeling approach introduced in this paper represents a new direction in person modeling. Whereas behavior modeling only yields information about a person within some narrow application context, and whereas demographic profiling paints an overly generalized picture of a person and often requires a profile to be filled out, our modeling of a person’s attitudes from a “memory” of personal experiences paints a richer, better-motivated picture about a person that has a wider range of potential applications than application-specific user models. User studies concerning the quality of the attitude prediction technology are promising and suggest that the currently implemented approach is strong enough to be used in fail-soft applications. In What Would They Think? the interface is designed to be fail-soft. The reactions given by the digital personas are meant to be evocative. The user is encouraged to further verify and investigate a purported attitude by clicking on a persona and viewing a textual explanation of the reaction.
In future work, we intend to further develop the modeling of attitudes by investigating how particularly strong beliefs such as “I love dogs” can help to create a model of a person’s identity. We also intend to investigate other applications for our person modeling approach, such as virtual mentors and guides, marketing, and document recommendation.
ACKNOWLEDGMENTS
The authors would like to thank Deb Roy, Barbara Barry, Push Singh, Andrea Lockerd, Marvin Minsky, Henry Lieberman, and Ted Selker for their comments on this work.
REFERENCES
1] Collin F. Baker, Charles J. Fillmore, John B. Lowe: 1998, The Berkeley FrameNet project. In Proceedings of the COLING-ACL, Montreal, Canada.
2] Mihayi Csikszentmihalyi: 1997, Finding flow: the psychology of engagement with everyday life. 1st ed. MasterMinds. 1997, New York: Basic Books.
3] A. Hars, J.T. Marchewka: 1996, Eliciting and mapping business rules to IS design: Introducing a natural language CASE tool. In: Ebert, R.J; Franz, L.: 1996 Proceedings Decision Sciences Institute, Vol.2, pp. 533-535.
4] David A. Kolb: 1985, Experiential Learning: Experience as the Source of Learning and Development. Prentice Hall.
5] George Lakoff, Mark Johnson: 1980, Metaphors We Live by. University of Chicago Press.
6] Beth Levin: 1993, English Verb Classes And Alternations: A Preliminary Investigation. The University of Chicago Press.
7] Henry Lieberman and Hugo Liu: 2004a, Feasibility Studies for Programming in Natural Language. In Lieberman, Paterno & Wulf (Eds.) End-User Development. Kluwer.
8] Hugo Liu: 2004a, MontyLingua v2.1 Free Natural Language Understanding Toolkit and API available at:
9] Hugo Liu and Henry Lieberman: 2004b, Toward a Programmatic Semantics of Natural Language. Proceedings of the 20th IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE Computer Society Press.
10] Hugo Liu and Push Singh: 2004b, ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal 22(4). Kluwer.
11] Erik T. Mueller: 1999: Prospects for in-depth story understanding by computer. arXiv:cs.AI/0003003 Retrieved from:
12]
.
13] J.F. Pane, C.A. Ratanamahatana, & B.A. Myers: 2001, Studying the Language and Structure in Non-Programmers' Solutions to Programming Problems. International Journal of Human-Computer Studies, 54(2), 237-264.
14] Jon M. Pearce & Steve Howard: 2004, Designing for Flow in a Complex Activity. 6th Asia-Pacific Conference on Computer-Human Interaction. Springer-Verlag.
15] R.C. Tam, D. Maulsby, and A.R. Puerta: 1998, U-TEL: A Tool for Eliciting User Task Models from Domain Experts. Proceedings of IUI'98, pp. 77-80.
16] Deerwester, S. et al. (1990). Indexing by latent semantic anlysis. Journal of the American Society of Information science:416(6), pp 391-407.
17] Ekman, P. (1993). Facial expression of emotion. American Psychologist, 48, 384-392.
18] Freud, S. (1991). The essentials of psycho-analysis: the definitive collection of Sigmund Freud's writing selected, with an introduction and commentaries, by Anna Freud. London: Penguin.
19] Lakoff, G. & Johnson, M. (1980). Metaphors We Live By, University of Chicago Press.
20] Liu, H. (2003b). A Computational Model of Human Affective Memory and Its Application to Mindreading. Submitted to FLAIRS 2004. Draft available at: /~hugo/publications/drafts/Affective-Mindreading-Liu.doc
21] Liu, H., Lieberman, H., Selker, T. (2003). A Model of Textual Affect Sensing using Real-World Knowledge. Proceedings of IUI 2003, pp. 125-132.
22] Liu, H., Selker, T., Lieberman, H. (2003b). Visualizing the Affective Structure of a Text Document. Proceedings of CHI 2003, pp. 740-741.
23] Locke, J. (1689). Essay Concerning Human Understanding Hypertext by ITL at Columbia University, 1995. Print version ed. P.H. Nidditch. Oxford, 1975.
24] McCloud, S. (1993). Understanding Comics, Kitchen Sink Press, Northhampton, Maine,
25] Mehrabian, A. (1995). for a comprehensive system of measures of emotional states: The PAD Model. (Available from Albert Mehrabian, 1130 Alta Mesa Road, Monterey, CA, USA 93940).
26] Minsky, M., (forthcoming). The Emotion Machine, Pantheon, New York. Several chapters are available at: .
27] Nass, C.I., Stener, J.S., and Tanber, E. (1994) Computers are social actors. In Proceedings of CHI ’94, (Boston, MA), pp. 72-78, April 1994.
28] Oren, T., Salomon, G., Kreitman, K. and Don, A. (1990). Guides: characterizing the interface. In Laurel, B. (Eds.) The art of human-computer interface design. Addison-Wesley.
29] Rhodes, B. and Starner, T. (1996). The Remembrance Agent: A continuously running automated information retrieval system. Proceedings of PAAM '96, pp. 487-495.
30] Shardanand, U. and Maes, P. (1995). Social information filtering: Algorithms for automating "word of mouth", Proceedings of CHI'95, 210-217.
31] Singh, P., (2002). The public acquisition of commonsense knowledge. Proceedings of AAAI Spring Symposium. Palo Alto, CA, AAAI.
32] Sison, R. and Shimura, M. (1998). Student modeling and machine learning. International Journal of Artificial Intelligence in Education, 9:128-158.
Tulving, E. (1983). Elements of episodic memory. Oxford: New York.
V
Whyte, W. (1988). City. Doubleday, New York.
33]
-----------------------
::: EPISODE FRAME :::
SUBEVENTS:
(eat John “ice cream”),
(ask I John “for taste”),
(refuse John)
MORAL: (selfish John)
CONTEXTS: (date), (park), ()
EPISODE-IMPORTANCE: 0.8
EPISODE-AFFECT: (-0.8,0.7,0)
(refuse John)
Text Excerpts
…2 Oct 01… Telemarketers called harassed me again today, interrupting my dinner. I’m really upset…
…4 Oct 01… The phone rang, and of course, it was a telemarketer. Damn it!
::: REFLEXIVE MEMORY :::
telemarketer = {
[2oct01, (-.3, .5, .2), .5],
[4oct01, (-.8, .8, .3), .4] } ;
dinner = {
[2oct01, (-.3, .5, .2), .2]}
“interrupt dinner” = {…} ;
….
Personality
traits
Explicit
attitudes
Implicit
attitudes
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.