MMR Format - HW



FearNot!: Final demonstrator of VICTEC

VERY MUCH A DRAFT

[pic]

AUTHORS: João Dias, Marco Vala, Steve Grant, Ruth Aylett, Ana Paiva, Rui Figueiredo, Daniel Sobral, Mafalda Fernandes, Mick Lockwood, Sandy Louchart.

STATUS: Discussion

CHECKERS:

PROJECT MANAGER

Name: Ruth Aylett

Address: CVE, Business House, University of Salford, University Road,, Salford, M5 4WT

Phone Number: +44 161 295 2922

Fax Number:+44 161 295 2925

E-mail: r.s.aylett@salford.ac.uk

TABLE OF CONTENTS

1 EXECUTIVE OVERVIEW

This deliverable presents the final prototype of the demonstrator FearNot!. First, it overviews the challenges and requirements posed for the development of FearNot!. Then, it describes the first version (the scripted version) of the system and draws some conclusions concerning its evaluation (reported in D7.2.1). Then, it described the runtime system and the emergent narrative version describing the agents, the view system and how the integration was done. Finally, it shows some of the results obtained in a small-scale evaluation of the emergent version.

2 INTRODUCTION

Intelligent Virtual Environments (IVEs) bring new challenges to the way we use technology in educational contexts, promoting and creating new learning experiences where experimentation and presence are explored. One of the big advantages of IVEs is that they offer a safe place where learners can explore and understand through experimentation without the dangers or problems of the real situations. Moreover, when IVEs are augmented with contextual information, questions and activities, they can engage learners in entertaining and motivating experiences, otherwise often considered as boring and uninteresting. Like computer games, IVEs may allow learners to get immersed and interact in synthetic worlds using a set of interaction facilities, such as movement and language interaction, as well as specific actions with other characters. Inhabiting these IVEs can be agents or intelligent characters, that are responsible for events that happen in the environment and make it not entirely predictable or completely controlled. Characters can be given the roles of teacher; helpers, companions, elements in the simulated worlds, or even friends. They become the part of the environment giving life to the interaction with the learners.

However, when considering Social learning, as in the case of bullying (and in FearNot!) using IVEs, these characters play a fundamental role, and their believability is perhaps one of the main goals to attain. A believable character has been defined as a character that gives the illusion of life and allows the user's suspension of disbelief [Bates94]. This quest for believability has indeed been the Holy Grail of the area of synthetic characters for years. However, given the nature of the concept (believability), several aspects are at stake. One of them is the character's appearance. Are more realistic characters more believable? And what about cartoon like characters?. A second factor that leads to believability is the character's autonomy. Again, some results show that the more autonomy may seem more believable. See for example the case of the tamagochis. However, autonomy is difficult to achieve in synthetic characters as there are tremendous technological problems, such as for example believable and expressive speech generation. Often, completely scripted characters may lead to more realistic and believable situations. Finally, one other important aspect to consider is the narrative structure of the story behind the characters. A good and believable situation may lead the user to believe the character itself.

Animators and film makers have for a long time been producing situations and characters that are believable, having the power to make the viewer feel emotional reactions. However, doing it in real time, with autonomous characters is still a difficult research challenge. It needs competences from agent research, believable characters, empathy, computer graphics, education and cinematography.

This deliverable presents the final version of FearNot! which combines research in all the above topics.

This deliverable is organised as follows: first we will describe the scripted version of FearNot!. Then we will briefly overview the emergent version of FearNot!, describing all its components: the language system; the agent’s minds and the view system. Finally, we will describe a small-scale initial evaluation performed with the emergent version of FearNot! that shows the relationships children had with the autonomous characters and establishes some comparisons with the scripted version.

3 FearNot!

The overall pragmatic objective of the development of FearNot! was to build an anti-bullying demonstrator in which children age 8-12 experience a virtual scenario where they can witness (from a third-person perspective) bullying situations.

To avoid group pressure and enable individualized interaction, the experience is for a single user. The child acts as an invisible friend to a victimized character, discussing the problems that arise and proposing coping strategies. Note that in bullying situations there are quite clear identifiable roles: the bully, the victim, bully-victim (a child that is sometimes the victim and sometimes the bully) and bystander.

The scenario begins by introducing the child to the school environment and the characters, providing a starting context (see Figure 3.1) . This initial presentation provides the background needed about the characters in the story (a description of who is the bully, the victim, and so on). Then, the episodes start. The whole session is developed one episode after another.

[pic]

Figure 3.1 Introduction to FearNot!: setting up the scene

Within an episode, the child is mostly a spectator of the unfolding events (the narrative emerges from the actions of the participant characters). After each episode, however, the victim will seek refuge in a resource room (identified as a library) where a personalized conversation with the user can occur.

[pic]

Figure 3.2: FearNot! Cycle

Then, the child takes the role of a friend of the victim advising her on what to do. A short dialogue takes place between the two (see Figure ??), where the victim raises the main events that occurred in the previous episode and asks for the child's (learner) opinion and suggestions for future behaviour. The dialogue established between the child user and the victim character is carried out based on a set of menus holding standard responses to bullying situations, however, allowing the children to express the reasons and the expectations for the advice given to the character victim. Nevertheless, note that the victim is clearly recognized as a believable self, with its own personality and behaviour, and thus may decide to reject the child's suggestions (see Figure ?? ).

[pic]

Figure 3.3. The interaction window

Each dialogue finishes with a decision that influences the character's behaviour in future episodes. In the initial version we developed three episodes that were pre-scripted, and the advice of the child would lead simply to a choice in the type of ending achieved. For example, if the child’s advice was to tell the parents or to seek help from a friend, the final outcome would be positive, otherwise the ending would be somehow negative.

4 Emergent FearNot!: General description

5 The Language System

The language system allows for the agents to communicate between themselves, as well as for the interpretation of the utterances by the children and the generation of all the utterances by the agents, so that the child can understand the story. Some definitions needed are:

• Language Engine – Each agent, including the agent that represents the user, has its own instance of the language engine class compiled into its code. The language engine is the interface for converting speech acts into utterances and user input into speech acts. Each agent has its own copy so that the conversational context can be agent-specific.

• Speech Act (SACT) – This is the XML information understood by the agent mind. Agents pass incomplete speech acts to their language engine, describing what they want to say (e.g. insult Eric). The language engine then creates an appropriate utterance, adds it to the SACT and returns it to the agent. User input is also passed to the user agent’s language engine within an incomplete speech act (this time as an utterance), and the language engine completes the SACT by adding semantic information, such as the class of speech act represented by the input text and anything else the agent wishes to know.

• Language Act (LACT) – This is XML information understood by the language engine. Each agent can have its own database of LACTs, although agents with similar needs can share a single database. The LACT database is used to convert SACTs into meaningful utterances, and user input into meaningful SACTs. Each language act in the database contains the information needed to choose and utter (or identify) an appropriate phrase for one given speech act type.

• Conversational Context – The language engine has the job of maintaining semantic coherence, freeing the agents from having to know anything at all about natural languages. It does this in several ways using context variables. Each context variable is a name/value pair, such as topic=football, swearword=moron, you=Eric. The flow of natural language is used to maintain these context variables without involving the agents at all. However, agents may alter/determine the current conversational context by sending/receiving context variables in a SACT. Context variables can be either local (each agent keeps its own copy) or global (all agents share a copy). They also come in two flavours: Lexical and Semantic. Lexical variables track the precise words being used, while semantic variables track more abstract information about the flow of the conversation.

• Synonym lists – the XML LACT database also contains a synonym dictionary, consisting of lists of words with similar meaning. These have many uses within the language system. Synonym lists have a root word (e.g. “idiot”) and a list of synonyms (moron, prat, jerk, etc), which can include common misspellings. They also have attributes that provide semantic information.

• Canonical form – an important part of the processes of extracting context and identifying user input is the conversion of natural language to a simplified and generalised form. I call this its ‘canonical’ form, and the process is called standardising the text. The canonical form of a sentence has had all of its words converted (where possible) into their root form, using synonym lists. It is also entirely lower case, has no punctuation, and each word is separated by a single space. For example the user input “Yeah - but yore such a MORON!” might be standardised to “yes but you’re such a idiot”. The process of standardisation identifies any changes of context (and hence is used even for agent utterances, although the resulting canonical string is then discarded) and it puts the phrase into a form that is easy to match against templates when identifying user input.

• Tokens – Phrases consist of explicit text intermingled with various tokens, delimited by square brackets, e.g. “Leave him alone, [YOU]”. These tokens are expanded when generating an utterance. Also, some tokens are used to help with identifying user input.

Program flow

• When an agent wishes to say something to another agent or the user, it first passes an incomplete speech act to the Say() method of the language engine. Say() uses the conversational context and the language act database to create an utterance, which is added to the SACT and returned. The agent then passes this to code that displays a speech bubble or whatever, as well as to the agent that needs to respond to the speech act.

• The recipient of a speech act (including its utterance) also passes it through its language engine via the Hear() method. This allows the language engine to extract personalised context information (for example the name of the person speaking to it). The Hear() method returns the SACT unaltered to the agent.

• The user is also represented by an agent, which has its own language engine. Natural language input from the user is presented to the engine via its Input() method, as a SACT containing (at a minimum) an utterance. The language engine uses the context and a (probably specialised) LACT database to find the language act that best represents the user’s intentions, the name of which is added to the SACT (along with context information such as who is speaking) and returned to the agent.

No other interaction between the agents and their language engines should be needed. It’s SACT-in, SACT-out in every case, with the language engine completing the missing elements of the SACT as appropriate.

Creating an utterance, in more detail

Information is extracted from the SACT, such as the names of the speaker and recipient (but also any other semantic information the agent wishes to provide), and is used to set various context variables.

The specified SACT class name is then looked up in the speaking agent’s LACT database. The resultant LACT entry consists of several phrases, which are scored for appropriateness, based on the current conversational context, some random noise, and how recently they’ve been uttered before.

The highest scoring phrase is chosen, and its tokens are expanded to produce an utterance. The utterance is added to the SACT and returned to the agent.

Finally, the utterance is temporarily converted into its canonical form, in order to extract useful contextual information from it using the synonym dictionary. This allows the speaking agent, the receiving agent and later speakers to know what exact words have been used in key parts of the speech act (e.g. which particular insult was hurled), and what semantic context changes there have been (a change of topic or mood, say).

Identifying user input, in more detail

A SACT containing an utterance (the raw user input), plus a small amount of other information, is sent to the language engine. The engine converts the user input into its canonical form. This has the effect of:

• Correcting common spelling mistakes

• Converting a wide variety of words into a smaller variety of root types for easier identification

• Extracting context information from the user input

The canonical user input is then compared against canonical versions of all the phrases stored in the user agent’s LACT database. Some phrases will be matched explicitly while others contain wildcard tokens such as [BEGINS], which allows sub-phrases and key words to be matched. The first phrase that matches the input defines the LACT name most applicable to the user’s intentions. This is then added to the speech act and returned to the agent.

By converting both the input and the LACT phrases to canonical form, it should be easy to find matches for even quite arbitrary user input (so we needn’t constrain the user into answering explicit questions as much as was feared). For example, if the user typed “I reckon yer shood have told the teecher”, this might be rendered canonically as “I think you should have tell a teacher” and matched successfully against the phrase “[ENDS] tell a teacher”.

Conversational context

The goal of the language system is to emit and consume natural language, maintaining semantic coherence despite the fact that the agents’ minds know only about intentions. This also needs to be managed in a flexible and extensible way. To make all this possible, the language engines need to keep track of the current context of the conversation. They do this by identifying various kinds of relevant information and storing it in context variables.

There are two pools of context variables:

a) A single global pool for context that is generic to all agents (for example the current topic of conversation, the particular swear word that someone has just used, the person everyone is talking about…)

b) A local pool, tracking context that’s specific to each agent (their name, the name of the person they’re currently speaking to, their sex, their role…)

One language engine can access the context variables of another. So if an agent needs to know the sex of another agent in order to decide how to phrase an utterance to it, the two language engines can access each other’s local pool directly, without bothering the agents’ minds with such details.

As well as two pools, there are two broad kinds of context: lexical and semantic. These are only distinguished internally by their use – there is no intrinsic difference between a lexical variable and a semantic one.

By lexical context I mean the explicit words the user or other agents have recently used for things. Say one agent decides to insult another, for example. The agent mind knows nothing about which precise insult he eventually uttered, so if another agent is to reply meaningfully, the latter must be able to know which insult words were actually used, not just the fact that an insult was received. A lexical context variable stores this information as a name/value pair such as “insult = moron”. The word automatically gets stored in such a context variable because that word or phrase has been found in a synonym list containing a type=“insult” attribute. The exact form of insult can later be used in an utterance by specifying the context variable name as a token in a phrase. So, for example, the user might type in “Tell him he’s a jerk” and the victim will know how to reply “I can’t say he’s a jerk, he’ll hit me”, without having to have an explicit phrase for every possible insult word the user (or another agent) might have used, and yet another phrase for every specific insult that the victim might reject.

By semantic context I mean arbitrary name/value pairs that tell us something about the general context of the conversation, for example that the current topic is football, or that the conversation has become spiteful. These are also defined using attributes in the synonym dictionary but are mostly made use of via the attributes of phrases (rather than tokens within a phrase). For example during a conversation about music, a particular music-related insult might score highly and be uttered in preference to a more generic one.

Both the above types of context are database-defined, not hard-coded. There are other context variables which lie somewhere between semantic and lexical in character and are hard-coded. These mostly deal with the names of agents and objects. For example each agent has its own [ME] variable, containing its name, it’s own [YOU] variable containing the name of the person it is currently speaking to, etc. These variables are filled automatically using various sources, and can be used in phrases via tokens.[1]

The contents of most of these variables are maintained automatically by the language system, but some are initialised by the agents themselves. The name, sex and role of the agent are stored in variables on start-up, the name of the recipient of a speech act is supplied in the speech act itself, and any of the other variables may optionally be altered or read by the agent within a speech act and used as the agent (or rather Joao!) sees fit.

Language Act databases

Each database is an XML file, comprising some or all of the following:

• A list of language acts, each of which contains a list of phrases.

• A list of synonyms

• A list of context variable declarations

There can be several LACT databases in the system. The user agent will almost certainly require its own. Some or all of the actor agents can share a single database. However, providing different databases for different agents is one way of tailoring them for different roles or personalities. Small role/personality differences would be better handled using context variables though.

Structure of a language act

A language act contains the information needed to either a) construct an appropriate utterance for a given class of speech act, or b) identify a speech act from some user input. In principle the same set of LACTs could be used for both tasks, since the syntax is common to each, but in practice the user will need his/her own database with a rather different style of LACT.

Syntax:

Some text

Some text

. . . etc . . .

Example:

You [+idiot]

You’re such a complete [+idiot]

You’re such a girlie!

Your hair makes you look like a boy

You kick like a fairy

Each LACT represents one type of speech act, and consists of a list of suitable phrases. When an agent wishes to utter an insult, its language engine locates the LACT whose name=”insult”, then scores the associated phrases to determine which one to utter. Further detail of how the language system functions can be found in Annex 1.

6. Conclusions

Annex 1 – Detailed functioning of the language system

In this Annex we give more detail of the way in which the language system handles utterances and the XML formats developed.

Structure of a phrase

Phrases are bounded by and . The opening tag may optionally have one or more attributes, which describe the contexts in which that phrase is more (or less) appropriate. The body of the phrase is a string of text, which may contain tokens delimited by square brackets. See below for more on attributes and tokens.

Phrase tokens

These are expanded to create the final utterance. Most of them obtain their information from context variables. For example if the last speaker (Eric) said “I think Bill is a moron”, the next speaker could reply using the phrase Don’t you call [HIM] a [INSULT], [YOU]! , which will be rendered as “Don’t you call Bill a moron, Eric!”.

Currently available tokens (preliminary):

[ME] = the speaker’s name. If the agent represents the user, this will be the user’s name, as established at the start of a session. Example: Hello, my name is [ME]

[YOU] = the name of the recipient of the speech act, and hence the person being spoken to.

[HIM] = the person being spoken about (any recognised name that isn’t the speaker or the recipient)

[IT] the specific object (pencil case, etc.) currently being spoken about

[INSULT] etc. Any arbitrary lexical context variable. It will be replaced with the word(s) that are currently associated with it. E.g. I don’t know why you think [BAND] are cool would become “I don’t know why you think Girls Aloud are cool”, if this is the band that had been mentioned previously (assuming there is a synonym list in the dictionary which identifies bands and assigns them to the [BAND] context variable).

[+rootword]. Tokens like this use the synonym list in reverse, to expand a single generic phrase into a varied one. E.g. if there is a synonym list such as idiot, moron, prat, jerk in the dictionary, the phrase You [+idiot] will be expanded using one of the listed synonyms (including the root), picked at random. Note: the word inside the token MUST be a root word.

[BEGINS], [ENDS], [CONTAINS]. These are only useful for identifying user input and MUST be at the start of the phrase. For example the phrase [CONTAINS] yes would match with any input containing the word “yes” or one of its synonyms. Note: this example is very generic and thus should go near the end of the LACT database, otherwise it might mask other, more specific phrases such as “yes but I didn’t mean that”.[2]

Phrase attributes

A phrase optionally contains attributes in its tag, which qualify it in some way. These attributes are taken account of when scoring phrases to generate an utterance. [3]

MeGender=”m” or “f” or “n” – The phrase should only be uttered if the speaker is male or female (or neuter!). Example: Stop treating me like a girl!

YouGender= - Only use this phrase if the person being spoken to is of the specified sex

HimGenger= - Ditto the person being spoken about

ItGender= - Ditto for the [IT] object being spoken about. Supports gender agreement in non-English languages. I’ve just put this in for completeness.[4]

SemanticAttribute=value - Any other attributes refer to arbitrary semantic context variables. For example if there is a context variable called [topic], an insult phrase might be qualified as You kick like a girlie and thus will be more likely than other phrases to be uttered when the conversation is about football.

Attributes are used to weight phrases, not filter them. So if none of the phrases in a language act are suitable by virtue of them all having unfulfilled attributes, one of them will still get chosen. Unfulfilled attributes add a strong negative weighting to the score. Recency of use adds a lower (and diminishing) negative weighting to prevent the same phrase being uttered twice in succession. Some Gaussian noise is then added and the highest scorer chosen. The final utterance is therefore most likely to be the one with the highest score (or one of the phrases with equal highest score) but slightly lower-scoring phrases will get chosen sometimes.[5]

Synonym lists

Each synonym list consists of an optional set of attributes and a list of words or phrases separated by commas, in which the first word or phrase is the root (canonical) form.[6]

The words or phrases after the root are either synonyms for the root form or common misspellings of it.

Example:

dislike, hate, hayt, loathe, lothe

There are three kinds of attribute permissible in tags, which I’m afraid differ in a slightly confusing way. The gist of the idea in all cases, though, is to use these synonym list attributes as a way to establish context information (both lexical and semantic) when constructing or analysing an utterance or user input.

Form 1: generalised version of type=

Type=“varname”

This is a generalised way of tracking lexical context. It’s often important that we know which particular form of words the user or another agent used, not just the type of word, so that later phrases can echo it back again.

If a synonym contains this attribute, the precise word or phrase that was used in the utterance/input is stored in the named context variable, so that it can be used in later utterances.

Example: Suppose we have the synonym list

idiot, moron, prat, jerk .

and the language acts

Hey [YOU], you’re such a [+idiot]

and

Don’t you call me a [INSULT], [YOU].

The bully (Fred) then insults the victim (Bill) by issuing an ‘InsultThem’ LACT. The [+idiot] token picks one of the synonyms for “idiot” at random. The phrase might thus be expanded into “Hey Bill, you’re such a jerk”.

Because the “idiot” synonym list has the attribute type=”insult”, the word “jerk” will get stored in the [INSULT] context variable.

When Bill then replies with a ‘RejectInsult’ LACT, this will automatically be expanded to “Don’t you call me a jerk, Fred.”

Equally, the user might have insulted Bill, and Bill’s reply will still echo back the correct insult word.

Note that “insult” is an arbitrary lexical context variable name. It could be any context variable in which we need to store the exact phrasing of something that was said, for later re-use.

Note also that the language engine (will eventually) convert “a” to “an” or vice versa, so we get “an idiot” but “a moron”.

Form 2: specialised versions of type=

Type=”person”, “object”, or other keywords yet to be decided[7]

These specify that the words in a synonym list represent a person, object or other specific type. If a word in this synonym list is found in user input or agent utterances, then the root word will be stored in a specific context variable, whose name is determined by the code.

Currently this is only used to fill in the [HIM] context variable. If someone (either agent or user) mentions an agent’s name (as revealed by a synonym list such as Bill, billy, bil, William, willy ) and the person mentioned is neither the speaker nor the listener, then this person must be being spoken about, so [HIM] gets set to “Bill” (the root form, not the synonym or misspelling). Later speakers can then use phrases like Don’t you call [HIM] a [INSULT] – he’s my friend.

Form 3: variable=value

This is the most generalised method, and exists to support semantic context. A common use of this form of attribute would be using keywords to define the current topic of conversation.

Suppose we have the synonym list

Beckham, beckam, david beckham

As before, the type=person attribute tells the system that this is a person’s name. It doesn’t matter that Beckham isn’t an agent – we might still want to refer to him in conversation using the [HIM] token in phrases.

The topic=football attribute sets the [TOPIC] semantic context to “football”, which informs the system in general terms that the conversation has shifted to football. We only know this fact because of the attribute – the user may have typed “beckam”, which will have been standardised to “Beckham”, but this in itself doesn’t tell us we are talking about football.

Unlike the lexical context variables, this doesn’t have any particular relevance to phrase tokens, although we might plausibly want an agent to utter something like I don’t really know much about [TOPIC] . A more important use is in scoring phrases when choosing an utterance. For example, in the LACT

You’re a [+idiot]

You kick like a pansy

The latter phrase is more likely to be picked than the former if the current topic of conversation is football.

Context variable declarations

The following variables are hard-coded into the system:

Local: [ME], [YOU], [IT], [ROLE], [SEX]

Global: [HIM]

Any other variables should be declared in the LACT XML as follows:

unknown

where the name is the name of the context variable, the type specifies whether the contents of the variable apply equally to all agents or are agent-specific, and the text within the tags sets a default value for the variable.

Putting it all together

Here is a valid (if pretty meaningless) sample LACT database, to show everything in one place:

Hello

Hi [YOU]

Hey [YOU] What’s up?

You're a bounder, sir!

You are a cad and a rotter!

Watch out or I’ll [+hit] you!

[CONTAINS] fight back

[CONTAINS] hit him

Stand up for yourself

Don’t let him get away with it

bill, bil, billy, will, billie, william

hit, thump, bash, clout

hello, hi, helo

idiot, moron, prat, jerk

unknown

idiot

normal

Speech Acts

The XML above describes the structure of a language act database. The agents deal in speech acts instead. These are much smaller snippets of XML passed to the language engine methods as a String parameter. The final structure can be changed if necessary to suit the agent code, but here’s the syntax I’ve been using as an assumption:

agent_name

agent_name

language_act_name

the text to be uttered

value

Speech acts are free to include other tags as required (e.g. an ID number). These aren’t needed by the language system and so will be ignored.

Utterance speech acts

In a speech act to be uttered, the , and elements are required, and elements are optional. Any elements are used to set the named context variables before processing the speech act. The agents might wish to use these to specify that a new [TOPIC] has been started, for example. However, most context information can be established without the agents’ involvement.

Note: If we need to take account of the agents’ emotional state when selecting utterances we can do this by supplying emotion data in elements (either regularly or when something significant has changed). These will get translated into context variables and so can be used to weight phrases using attributes.

After processing, the speech act will have had an element added to it.

User input speech acts

In a speech act representing user input, the only compulsory element is . The SACT may optionally contain some or all of the other tags.

If any elements are included by the agent, these must contain default values, representing the answer the agent would like to receive if, for some reason, the language engine can’t supply them. They must not be empty.

These elements are interpreted by the language system as requests for information. They will be filled in from the context variables before returning the SACT to the agent. The agent may use these in any way it pleases.

The language engine will also add a element if not already present (although not a )[8], and will add a element reflecting its judgement about which class of language act best fitted the user’s input. If no suitable language act could be found, the engine will return unknown , which the agent should interpret by asking the user to explain better.

Heard speech acts

The other circumstance in which a speech act is passed to the language engine is when this agent is the recipient of a speech act. The act should be passes as it was received, complete with element. It will be returned unaltered.[9]

6 The “View” System constructed in FearNot!

7 The Agents in FearNot!

The agents constructed in FearNot! Follow the architecture proposed in Deliverable 5.3. However, some changes were done in order for the clear adaptation to the view system and to the new language system. In this section we will describe the final architecture of the agents, and provide a description of behaviour generated from the agents.

8 Evaluation of the Emergent Narrative in FearNot!

The emergent narrative version of FearNot! was not subjected to the large evaluation carried out in the three countries and reported in Deliverable 6.3. Instead we have conducted a small evaluation in a Portuguese school, with the aim of accessing if the children liked the system and felt empathy for the characters, but also, if the children felt more in control of the whole narrative and in control of the outcome of the story.

9 Conclusions

When we started the construction of FearNot! we were aware of the difficulty of the task at hands. However, some problems that we hadn’t expected arose (difficulty in controlling the graphics elements; need for a middle layer between the agent’s actions and a game engine; difficulty in the language generation and interpretation, among others), proving us that building an application with “autonomous” characters that perform as actors in complex situations is still a very difficult and challenging endeavour. Perhaps we didn’t achieve the results we expected, as far as building a reliable and almost commercial application with autonomous synthetic characters and emergent narrative, but the results achieved can nevertheless be considered very good.

The achievements were:

- A reliable agent architecture that generates believable behaviours that users can interpret and follow as a story;

- A middle layer software level that allows for the agents to generate actions and for those to be displayed graphically and in a cinematographic way understandable by the children;

- An application, that although less cinematographic than the scripted version, still can convey the story and the episode to the children;

- A language system, adaptable for more than one language (cases were built for English and Portuguese) that can be used by the agent’s to interact with the users and at the same time interact between each other.

-----------------------

[1] These variables aren’t quite lexical, because they don’t keep track of the exact word used by another agent, only the root word. The user, say, might refer to agent Bill as “William” or “bil”, but any affected context variables (such as [HIM]) will contain the root form, “Bill”. This is partly so that agents won’t echo back users’ mis-spellings.

[2] I’ll probably think up some more tokens in the coming weeks!

[3] Eventually I may use them to help when identifying user input too.

[4] I think it might be helpful to add things like MeRole and YouRole to identify phrases well suited to e.g. the victim talking to the bully. If so, there may be a more general way of specifying MeXXX, YouXXX and the above syntax may change.

[5] We’ll need to tweak the weightings and noise to get the balance right.

[6] There is currently only one synonym dictionary in the system – synonym lists are added to it wherever they are found, even if they are spread across several databases. This means that all agents share the same dictionary (which is sensible) but so does the user (which may not be – for example only the user’s synonym lists should contain misspellings). If this turns out to be an issue I can probably modify it so that each database has a separate dictionary.

[7] Eventually I’ll use this for other things, such as setting the [IT] variable when type=object. If any other possibilities occur to you, just let me know.

[8] The user agent doesn’t automatically know who the user-input speech act is being directed to, so can’t add a element. In practice this will always be the victim, so I can, if this is helpful, scan all the language engines to find one containing [ROLE]=”victim” and use its [ME]= context variable to set up a element. I haven’t done this yet.

[9] So far I haven’t found any need for agents to “overhear” what other agents are saying to each other. However, if this turns out to be useful, the same Hear() method can be used (the language engine will know that the act was merely overheard because the name won’t match that of the agent whose Hear() method was called.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download