Lexicons for Sentiment, Affect, and Connotation - Stanford University

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2018. All rights reserved. Draft of September 23, 2018.

CHAPTER

19 Lexicons for Sentiment, Affect, and Connotation

"[W]e write, not with the fingers, but with the whole person. The nerve which controls the pen winds itself about every fibre of our being, threads the heart, pierces the liver."

Virginia Woolf, Orlando

"She runs the gamut of emotions from A to B." Dorothy Parker, reviewing Hepburn's performance in Little Women

affective subjectivity

In this chapter we turn to tools for interpreting affective meaning, extending our study of sentiment analysis in Chapter 4. We use the word `affective', following the tradition in affective computing (Picard, 1995) to mean emotion, sentiment, personality, mood, and attitudes. Affective meaning is closely related to subjectivity, the study of a speaker or writer's evaluations, opinions, emotions, and speculations (Wiebe et al., 1999).

How should affective meaning be defined? One influential typology of affective states comes from Scherer (2000), who defines each class of affective states by factors like its cognition realization and time course:

Emotion: Relatively brief episode of response to the evaluation of an external or internal event as being of major significance. (angry, sad, joyful, fearful, ashamed, proud, elated, desperate)

Mood: Diffuse affect state, most pronounced as change in subjective feeling, of low intensity but relatively long duration, often without apparent cause. (cheerful, gloomy, irritable, listless, depressed, buoyant)

Interpersonal stance: Affective stance taken toward another person in a specific interaction, colouring the interpersonal exchange in that situation. (distant, cold, warm, supportive, contemptuous, friendly)

Attitude: Relatively enduring, affectively colored beliefs, preferences, and predispositions towards objects or persons. (liking, loving, hating, valuing, desiring)

Personality traits: Emotionally laden, stable personality dispositions and behavior tendencies, typical for a person. (nervous, anxious, reckless, morose, hostile, jealous)

Figure 19.1 The Scherer typology of affective states (Scherer, 2000).

We can design extractors for each of these kinds of affective states. Chapter 4 already introduced sentiment analysis, the task of extracting the positive or negative

2 CHAPTER 19 ? LEXICONS FOR SENTIMENT, AFFECT, AND CONNOTATION

connotations

orientation that a writer expresses in a text. This corresponds in Scherer's typology to the extraction of attitudes: figuring out what people like or dislike, from affectrish texts like consumer reviews of books or movies, newspaper editorials, or public sentiment in blogs or tweets.

Detecting emotion and moods is useful for detecting whether a student is confused, engaged, or certain when interacting with a tutorial system, whether a caller to a help line is frustrated, whether someone's blog posts or tweets indicated depression. Detecting emotions like fear in novels, for example, could help us trace what groups or situations are feared and how that changes over time.

Detecting different interpersonal stances can be useful when extracting information from human-human conversations. The goal here is to detect stances like friendliness or awkwardness in interviews or friendly conversations, or even to detect flirtation in dating. For the task of automatically summarizing meetings, we'd like to be able to automatically understand the social relations between people, who is friendly or antagonistic to whom. A related task is finding parts of a conversation where people are especially excited or engaged, conversational hot spots that can help a summarizer focus on the correct region.

Detecting the personality of a user--such as whether the user is an extrovert or the extent to which they are open to experience-- can help improve conversational agents, which seem to work better if they match users' personality expectations (Mairesse and Walker, 2008).

Affect is important for generation as well as recognition; synthesizing affect is important for conversational agents in various domains, including literacy tutors such as children's storybooks, or computer games.

In Chapter 4 we introduced the use of Naive Bayes classification to classify a document's sentiment. Various classifiers have been successfully applied to many of these tasks, using all the words in the training set as input to a classifier which then determines the affect status of the text.

In this chapter we focus on an alternative model, in which instead of using every word as a feature, we focus only on certain words, ones that carry particularly strong cues to affect or sentiment. We call these lists of words affective lexicons or sentiment lexicons. These lexicons presuppose a fact about semantics: that words have affective meanings or connotations. The word connotation has different meanings in different fields, but here we use it to mean the aspects of a word's meaning that are related to a writer or reader's emotions, sentiment, opinions, or evaluations. In addition to their ability to help determine the affective status of a text, connotation lexicons can be useful features for other kinds of affective tasks, and for computational social science analysis.

In the next sections we introduce basic theories of emotion, show how sentiment lexicons can be viewed as a special case of emotion lexicons, and then summarize some publicly available lexicons. We then introduce three ways for building new lexicons: human labeling, semi-supervised, and supervised.

Finally, we turn to some other kinds of affective meaning, including interpersonal stance, personality, and connotation frames.

19.1 Defining Emotion

emotion One of the most important affective classes is emotion, which Scherer (2000) defines as a "relatively brief episode of response to the evaluation of an external or internal

19.1 ? DEFINING EMOTION 3

basic emotions

event as being of major significance". Detecting emotion has the potential to improve a number of language processing

tasks. Automatically detecting emotions in reviews or customer responses (anger, dissatisfaction, trust) could help businesses recognize specific problem areas or ones that are going well. Emotion recognition could help dialog systems like tutoring systems detect that a student was unhappy, bored, hesitant, confident, and so on. Emotion can play a role in medical informatics tasks like detecting depression or suicidal intent. Detecting emotions expressed toward characters in novels might play a role in understanding how different social groups were viewed by society at different times.

There are two widely-held families of theories of emotion. In one family, emotions are viewed as fixed atomic units, limited in number, and from which others are generated, often called basic emotions (Tomkins 1962, Plutchik 1962). Perhaps most well-known of this family of theories are the 6 emotions proposed by (Ekman, 1999) as a set of emotions that is likely to be universally present in all cultures: surprise, happiness, anger, fear, disgust, sadness. Another atomic theory is the (Plutchik, 1980) wheel of emotion, consisting of 8 basic emotions in four opposing pairs: joy?sadness, anger?fear, trust?disgust, and anticipation?surprise, together with the emotions derived from them, shown in Fig. 19.2.

Figure 19.2 Plutchik wheel of emotion.

The second class of emotion theories views emotion as a space in 2 or 3 dimensions (Russell, 1980). Most models include the two dimensions valence and arousal, and many add a third, dominance. These can be defined as:

valence: the pleasantness of the stimulus arousal: the intensity of emotion provoked by the stimulus dominance: the degree of control exerted by the stimulus

In the next sections we'll see lexicons for both kinds of theories of emotion.

4 CHAPTER 19 ? LEXICONS FOR SENTIMENT, AFFECT, AND CONNOTATION

Sentiment can be viewed as a special case of this second view of emotions as points in space. In particular, the valence dimension, measuring how pleasant or unpleasant a word is, is often used directly as a measure of sentiment.

19.2 Available Sentiment and Affect Lexicons

General Inquirer

A wide variety of affect lexicons have been created and released. The most basic lexicons label words along one dimension of semantic variability, generally called "sentiment" or "valence".

In the simplest lexicons this dimension is represented in a binary fashion, with a wordlist for positive words and a wordlist for negative words. The oldest is the General Inquirer (Stone et al., 1966), which drew on early work in the cognition psychology of word meaning (Osgood et al., 1957) and on work in content analysis. The General Inquirer has a lexicon of 1915 positive words an done of 2291 negative words (and also includes other lexicons discussed below).

The MPQA Subjectivity lexicon (Wilson et al., 2005) has 2718 positive and 4912 negative words drawn from prior lexicons plus a bootstrapped list of subjective words and phrases (Riloff and Wiebe, 2003) Each entry in the lexicon is handlabeled for sentiment and also labeled for reliability (strongly subjective or weakly subjective).

The polarity lexicon of Hu and Liu (2004) gives 2006 positive and 4783 negative words, drawn from product reviews, labeled using a bootstrapping method from WordNet.

Positive

admire, amazing, assure, celebration, charm, eager, enthusiastic, excellent, fancy, fantastic, frolic, graceful, happy, joy, luck, majesty, mercy, nice, patience, perfect, proud, rejoice, relief, respect, satisfactorily, sensational, super, terrific, thank, vivid, wise, wonderful, zest

Negative abominable, anger, anxious, bad, catastrophe, cheap, complaint, condescending, deceit, defective, disappointment, embarrass, fake, fear, filthy, fool, guilt, hate, idiot, inflict, lazy, miserable, mourn, nervous, objection, pest, plot, reject, scream, silly, terrible, unfriendly, vile, wicked

Figure 19.3 Some samples of words with consistent sentiment across three sentiment lexicons: the General Inquirer (Stone et al., 1966), the MPQA Subjectivity lexicon (Wilson et al., 2005), and the polarity lexicon of Hu and Liu (2004).

EmoLex

Slightly more general than these sentiment lexicons are lexicons that assign each word a value on all three emotional dimension The lexicon of Warriner et al. (2013) assigns valence, arousal, and dominance scores to 14,000 words. Some examples are shown in Fig. 19.4

The NRC Word-Emotion Association Lexicon, also called EmoLex (Mohammad and Turney, 2013), uses the Plutchik (1980) 8 basic emotions defined above. The lexicon includes around 14,000 words including words from prior lexicons as well as frequent nouns, verbs, adverbs and adjectives. Values from the lexicon for some sample words:

19.3 ? CREATING AFFECT LEXICONS BY HUMAN LABELING 5

Valence

Arousal

Dominance

vacation

8.53

happy

8.47

whistle

5.7

conscious

5.53

torture

1.4

rampage

7.56

tornado

7.45

zucchini

4.18

dressy

4.15

dull

1.67

self

7.74

incredible

7.74

skillet

5.33

concur

5.29

earthquake

2.14

Figure 19.4 Samples of the values of selected words on the three emotional dimensions from Warriner et al. (2013).

anger anticipation disgust fear joy sadness surprise trust positive negative

Word

reward 0 1 0 0 1 0 1 1 1 0

worry 0 1 0 1 0 1 0 0 0 1

tenderness 0 0 0 0 1 0 0 0 1 0

sweetheart 0 1 0 0 1 1 0 1 1 0

suddenly 0 0 0 0 0 0 1 0 0 0

thirst

01 0 00 11 00 0

garbage 0 0 1 0 0 0 0 0 0 1

concrete abstract

LIWC

There are various other hand-built affective lexicons. The General Inquirer includes additional lexicons for dimensions like strong vs. weak, active vs. passive, overstated vs. understated, as well as lexicons for categories like pleasure, pain, virtue, vice, motivation, and cognitive orientation.

Another useful feature for various tasks is the distinction between concrete words like banana or bathrobe and abstract words like belief and although. The lexicon in (Brysbaert et al., 2014) used crowdsourcing to assign a rating from 1 to 5 of the concreteness of 40,000 words, thus assigning banana, bathrobe, and bagel 5, belief 1.19, although 1.07, and in between words like brisk a 2.5.

LIWC, Linguistic Inquiry and Word Count, is another set of 73 lexicons containing over 2300 words (Pennebaker et al., 2007), designed to capture aspects of lexical meaning relevant for social psychological tasks. In addition to sentimentrelated lexicons like ones for negative emotion (bad, weird, hate, problem, tough) and positive emotion (love, nice, sweet), LIWC includes lexicons for categories like anger, sadness, cognitive mechanisms, perception, tentative, and inhibition, shown in Fig. 19.5.

19.3 Creating affect lexicons by human labeling

crowdsourcing

The earliest method used to build affect lexicons, and still in common use, is to have humans label each word. This is now most commonly done via crowdsourcing: breaking the task into small pieces and distributing them to a large number of annotators. Let's take a look at some of the methodological choices for two crowdsourced emotion lexicons.

The NRC Word-Emotion Association Lexicon (EmoLex) (Mohammad and Turney, 2013), labeled emotions in two steps. In order to ensure that the annotators were judging the correct sense of the word, they first answered a multiple-choice

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download