Narrative Style and the Frequencies of Very Common Words ...

[Pages:10]Narrative Style and the Frequencies of Very Common Words: A Corpus-Based Approach to

Dickens's First Person and Third Person Narratives*

Tomoji Tabata

Abstract The present article is devoted to statistical analysis of the language of Dickens's

novels. The particular problem is to examine structural and stylistic features of the first person and third person narratives. In the following analysis, I apply Principal Component Analysis (PCA) to the examination of the frequency-patterns of very common word-types of the text-samples. What emerges from this approach is a remarkable contrast between the two narrative modes. The differentiae between Dickens's first person and third person narratives suggest a broad opposition between a more oral, subjective, verbal style and a more literate, descriptive, nominal style.

0. Introduction

The choice of a particular point of view is arguably one of the most crucial decisions in beginning a fictional discourse. Decisions may include whether to employ a first person narrator or a third person narrator, whether to narrate in the present tense or the past tense, and so forth. While a first person narrative and a third person narrative differ obviously from each other in terms of the presence, or the total absence, of the first person pronouns, I, me, and my, the two narrative modes are likely to differ in less obvious ways. They share, no doubt, many characteristics of the language of narrative distinct from other genres of English prose.

This study examines, from a quantitative viewpoint, linguistic and stylistic attributes of the two narrative modes to demonstrate how Dickens differentiates one mode of narrative from another. The approach I adopt in this study has three characteristics. First of all, it is corpus-based: the word frequency profiles that will appear a little later and other frequency counts are derived from a computerised text-corpus. Second, it focuses on very common word-types, most of

English Corpus Studies, No. 2, 1995, pp. 91-109.

92

Tomoji Tabata

which are function words, rather than rare words or so-called "key-words" that are usually focused on in studies of literary texts. Third, it is based on multivariate statistics to illustrate relationships among very common word-types; relationships among text-samples; relationships between the very common wordtypes and the text-samples. This method has produced justifiable results in studies of disputed authorship (Burrows: 1989, Craig: 1992); literary idiolects (Burrows: 1987a, Tabata: 1991); stylistic changes that occur over an author's career (Tabata 1993 & 1994 forthcoming); and in other areas as well.

Label

Table 1. The Set of Eleven Narratives

Narrator [TEXT] & Date

Word-tokens Segments [Pure-Narrative]

First Person Narratives

David#1-5 David [David Copperfield] (1849-50)

Esther#1-4 Esther [Bleak House]

(1852-3)

Pip#1-4 Pip [Great Expectations] (1860-1)

Group Total

20145

5

18399

4

18359

4

56903

13

Third Person Narratives

SB#1-3

Sketches by Boz

PP#1-3

The Pickwick Papers

OT#1-4 Oliver Twist

NN#1-3 Nicholas Nickleby

BH#1-2 Bleak House

TTC#1-3 A Tale of Two Cities

OMF#1-3 Our Mutual Friend

(1836) (1836-7) (1837-8) (1838-9) (1852-3)

(1859) (1864-5)

ED#1-3 The Mystery of Edwin Drood (1870)

Group Total

12569

3

11081

3

16677

4

12863

3

7389

2

12798

3

13117

3

11973

3

98467

24

1. Data

1.1 Corpus The corpus draws on ten novels from Dickens's oeuvre (See Table 1).1 Each

text is represented by approximately twenty thousand words from the beginning of the novel, and the language of "pure-narrative" is extracted as a basis of comparison.2 The current corpus consists of three first person narratives--David Copperfield, Esther's narrative, and Great Expectations--and eight third person

Narrative Style & the Frequencies of Very Common Words

93

narratives. Bleak House provides two narratives: one is the first person narrative by the character narrator Esther Summerson, the other is the anonymous third person narrative. The two narratives of Bleak House are also contrasted in the use of verb tenses. While Esther uses the past tense, the anonymous narrator employs the "dramatic present." The dramatic present is also used in The Mystery of Edwin Drood and in some parts of Sketches by Boz and David Copperfield.

Each text is then divided into successive 4000-word segments. Segmentation of text has two objectives. First, to give each variable (i.e., word) as appropriate a number of samples as possible in order to reduce the possibility of chance effect. Second, to help observe internal variation (or consistency) in each text. In all, the present study analyses 37 segments (or text-samples), of which 13 are first person narratives and 24 are third person narratives.3

1.2 Some preliminary treatments of data In the present case, the discrepancy between the first person and the third

person narratives in an incidence of first person pronouns is too obvious to require a statistical analysis. It is desirable, therefore, to exclude those pronouns from the following statistical analysis so as to diminish the overshadowing effect of what is already evident. Otherwise the difference due to the incidence of first person pronouns will become so inflated through statistical treatments that other subtler differences may be submerged. This exclusion of first person pronouns deprives my data of some interesting subjects for computational stylistics, but in return it makes them sensitive to evidence of subtler stylistic differences.

Another problem is concerned with verb forms. My earlier studies have shown that the top 100 words include only a small number of verbs--mostly preterite forms of common verbs, such as was, had, and said.4 The size of the present corpus, in addition, is not large enough to process verbs of lower frequency. If words of low frequency are subjected to a statistical analysis, the dearth of numbers may cause an aberrant result. The recognised solution is lemmatisation. For example, take, takes, took, taken, and taking are lemmatised as take. Lemmatisation enables a number of verbs to rank higher than in my

Table 2. Eleven Narrators in Dickens's Novels: Standardised (text-percentage) frequencies for the 100 most common word-types in the "pure-narrative."

94

Tomoji Tabata

Rank Word-types SB PP OT NN David Esther BH TTC Pip OMF ED Total (raw) (%)

1 the 2 and 3 be 4 of 5a 6 in(p) 7 his 8 have 9 to(i) 10 he 11 with 12 to(p) 13 say 14 it 15 as 16 at 17 that(c) 18 on(p) 19 by(p) 20 her(a) 21 which(r) 22 him 23 for(p) 24 but 25 she 26 not 27 from 28 when 29 this 30 all 31 an 32 they 33 look

7.606 3.914 3.477 4.225 2.912 2.164 1.090 1.567 1.201 1.034 1.154 1.034 0.151 0.676 0.692 0.756 0.549 0.835 0.525 0.271 0.812 0.342 0.732 0.422 0.127 0.501 0.485 0.159 0.294 0.326 0.493 0.509 0.127

9.097 4.088 2.969 3.592 2.346 1.660 2.265 1.516 1.101 1.354 1.263 1.119 1.724 0.605 0.957 0.496 0.388 0.713 0.578 0.108 0.641 0.298 0.415 0.289 0.009 0.262 0.478 0.343 0.307 0.171 0.433 0.325 0.244

7.327 3.598 3.352 3.190 2.908 1.847 2.027 1.619 1.325 1.961 1.091 1.091 1.133 0.768 0.851 0.738 0.660 0.522 0.672 0.168 0.762 0.899 0.570 0.414 0.126 0.336 0.402 0.342 0.546 0.300 0.348 0.444 0.216

6.320 4.019 2.946 3.281 3.001 1.788 1.998 1.174 1.314 1.454 1.104 1.026 1.508 0.700 1.011 0.910 0.599 0.575 0.536 0.288 0.669 0.474 0.459 0.342 0.155 0.327 0.443 0.350 0.498 0.420 0.233 0.389 0.334

4.433 3.927 3.783 2.636 2.442 1.812 0.521 1.762 1.524 0.789 1.052 1.176 1.142 1.365 1.082 1.023 0.933 0.660 0.457 0.988 0.417 0.392 0.491 0.660 0.963 0.551 0.308 0.536 0.382 0.367 0.308 0.268 0.432

4.723 4.310 3.565 2.462 2.571 1.853 1.005 1.631 1.549 1.223 1.277 1.163 1.614 1.076 0.848 0.962 0.989 0.554 0.435 0.598 0.424 0.478 0.484 0.582 0.902 0.554 0.326 0.462 0.217 0.451 0.342 0.217 0.321

6.834 3.424 3.816 3.424 2.774 2.463 0.947 1.719 1.340 1.177 0.920 0.988 0.650 1.272 0.826 0.758 0.595 0.555 0.528 0.839 0.420 0.338 0.568 0.555 0.568 0.447 0.406 0.392 0.298 0.352 0.392 0.284 0.298

7.462 4.329 3.110 3.469 2.508 2.016 2.188 1.469 1.188 1.453 1.274 1.203 0.445 1.399 0.938 0.836 0.484 0.766 0.524 0.656 0.398 0.641 0.328 0.445 0.445 0.391 0.484 0.344 0.313 0.336 0.336 0.539 0.453

5.817 4.210 3.470 2.511 2.495 1.672 1.117 1.759 1.416 1.073 1.149 1.024 0.980 1.285 1.008 1.100 1.002 0.757 0.452 0.376 0.381 0.507 0.479 0.523 0.616 0.523 0.376 0.485 0.338 0.479 0.289 0.207 0.468

6.602 3.690 2.851 3.255 3.171 2.173 2.295 1.243 1.189 1.479 1.395 1.235 0.991 1.113 1.037 0.953 0.602 0.724 0.640 0.793 0.435 0.496 0.343 0.450 0.267 0.381 0.267 0.252 0.442 0.328 0.450 0.175 0.358

6.515 3.792 2.923 3.566 2.773 2.096 2.038 0.969 1.111 1.178 1.336 1.169 0.793 1.052 1.128 0.618 0.501 0.727 0.585 0.685 0.309 0.443 0.543 0.317 0.292 0.267 0.326 0.309 0.451 0.334 0.443 0.409 0.292

9935 6164 5163 4879 4194 2983 2373 2358 2055 1983 1841 1736 1630 1624 1476 1337 1098 1040

826 821 794 772 762 729 700 664 595 585 579 561 560 518 516

6.394 3.967 3.323 3.140 2.699 1.920 1.527 1.518 1.323 1.276 1.185 1.117 1.049 1.045 0.950 0.861 0.707 0.669 0.532 0.528 0.511 0.497 0.490 0.469 0.451 0.427 0.383 0.377 0.373 0.361 0.360 0.333 0.332

*(a) = adjective, (adv) = adverbials, (a.d.) = adverb of degree, (c) = conjunction, (d) = demonstrative, (i) = infinitive, (r) = relative, (p) = preposition, (pron) = pronoun

Table 2. (continued)

Rank Word-types SB PP OT NN David Esther BH TTC Pip OMF ED Total (raw) (%)

Narrative Style & the Frequencies of Very Common Words

34 or 35 out 36 there 37 into 38 one 38 who(r) 40 that(d) 41 very 42 if 43 little 44 up(adv) 45 go 46 so(a.d.) 47 do 48 upon(p) 49 take 50 their 51 make 52 no(a) 53 come 54 them 55 would 56 see 57 down 58 some 59 could 60 more 61 old 62 man 63 then 64 before 65 her(pron) 66 other 67 over 68 again

0.398 0.080 0.294 0.382 0.446 0.358 0.239 0.223 0.207 0.199 0.151 0.080 0.151 0.095 0.191 0.183 0.549 0.088 0.326 0.111 0.278 0.334 0.183 0.088 0.263 0.127 0.239 0.255 0.294 0.175 0.223 0.095 0.271 0.167 0.127

0.153 0.199 0.217 0.208 0.235 0.253 0.280 0.244 0.208 0.190 0.280 0.099 0.126 0.162 0.190 0.208 0.316 0.171 0.153 0.072 0.099 0.135 0.027 0.117 0.135 0.126 0.180 0.162 0.343 0.262 0.108 0.018 0.190 0.153 0.171

0.372 0.222 0.288 0.408 0.282 0.384 0.222 0.462 0.186 0.222 0.288 0.096 0.174 0.198 0.228 0.246 0.216 0.228 0.258 0.144 0.138 0.330 0.180 0.198 0.210 0.168 0.210 0.294 0.126 0.126 0.168 0.012 0.174 0.192 0.108

0.365 0.194 0.319 0.365 0.350 0.404 0.334 0.365 0.272 0.404 0.327 0.210 0.187 0.233 0.334 0.264 0.381 0.194 0.210 0.117 0.288 0.179 0.124 0.109 0.179 0.171 0.187 0.117 0.155 0.124 0.163 0.054 0.264 0.093 0.086

0.357 0.506 0.387 0.268 0.323 0.218 0.472 0.338 0.377 0.338 0.333 0.442 0.283 0.501 0.268 0.253 0.104 0.357 0.223 0.377 0.194 0.199 0.278 0.228 0.243 0.298 0.194 0.114 0.055 0.164 0.179 0.412 0.129 0.129 0.238

0.255 0.364 0.375 0.217 0.239 0.223 0.239 0.413 0.288 0.413 0.228 0.408 0.554 0.261 0.212 0.174 0.082 0.250 0.163 0.315 0.207 0.234 0.288 0.207 0.158 0.326 0.207 0.304 0.147 0.136 0.207 0.402 0.114 0.125 0.168

0.379 0.487 0.365 0.392 0.217 0.514 0.203 0.271 0.352 0.298 0.257 0.217 0.230 0.176 0.284 0.135 0.338 0.284 0.244 0.325 0.244 0.203 0.189 0.203 0.203 0.054 0.135 0.284 0.176 0.041 0.149 0.257 0.108 0.108 0.149

0.305 0.391 0.328 0.352 0.367 0.211 0.273 0.211 0.211 0.266 0.289 0.219 0.227 0.242 0.336 0.250 0.273 0.219 0.242 0.234 0.367 0.164 0.148 0.313 0.211 0.164 0.180 0.094 0.133 0.164 0.211 0.195 0.211 0.250 0.203

0.283 0.414 0.327 0.283 0.234 0.245 0.272 0.185 0.468 0.169 0.376 0.370 0.278 0.289 0.240 0.338 0.180 0.332 0.267 0.289 0.267 0.245 0.430 0.272 0.218 0.256 0.196 0.065 0.283 0.278 0.174 0.114 0.153 0.174 0.153

0.381 0.282 0.198 0.160 0.236 0.435 0.274 0.175 0.198 0.229 0.282 0.320 0.175 0.168 0.206 0.252 0.191 0.183 0.244 0.236 0.130 0.099 0.061 0.252 0.168 0.137 0.107 0.114 0.259 0.198 0.229 0.076 0.206 0.267 0.221

0.342 0.342 0.234 0.393 0.309 0.134 0.259 0.150 0.284 0.384 0.200 0.217 0.242 0.184 0.234 0.284 0.242 0.192 0.192 0.184 0.242 0.109 0.150 0.209 0.234 0.042 0.192 0.284 0.125 0.284 0.117 0.159 0.117 0.200 0.192

505 0.325 503 0.324 480 0.309 474 0.305 457 0.294 457 0.294 447 0.288 445 0.286 443 0.285 442 0.284 435 0.280 408 0.263 394 0.254 383 0.247 382 0.246 375 0.241 372 0.239 368 0.237 356 0.229 355 0.228 343 0.221 325 0.209 319 0.205 318 0.205 316 0.203 295 0.190 292 0.188 287 0.185 285 0.183 281 0.181 277 0.178 274 0.176 269 0.173 262 0.169 260 0.167

95

*(a) = adjective, (adv) = adverbials, (a.d.) = adverb of degree, (c) = conjunction, (d) = demonstrative, (i) = infinitive, (r) = relative, (p) = prepos (pron) = pronoun

Table 2. (continued)

96

Rank Word-types SB PP OT NN David Esther BH TTC Pip OMF ED Total (raw) (%)

Tomoji Tabata

69 its 69 that(r) 71 time 72 two 73 than 74 about 74 head 76 himself 77 gentleman 78 know 78 what 80 reply 81 after 81 much 83 any 84 face 85 great 86 hand 87 like(p) 88 eyes 88 turn 90 mother 91 get 92 such 93 on(adv) 93 seem 95 back 95 sit 97 think 97 way 97 young 100 never

0.247 0.215 0.151 0.239 0.095 0.167 0.064 0.127 0.064 0.159 0.095 0.024 0.048 0.064 0.199 0.072 0.103 0.040 0.080 0.048 0.072 0.056 0.080 0.151 0.103 0.072 0.088 0.024 0.056 0.056 0.048 0.143

0.171 0.072 0.126 0.153 0.099 0.090 0.126 0.208 0.334 0.063 0.054 0.433 0.144 0.126 0.081 0.081 0.180 0.153 0.072 0.144 0.144 0.000 0.036 0.117 0.117 0.027 0.063 0.081 0.063 0.099 0.090 0.000

0.060 0.090 0.180 0.120 0.168 0.114 0.204 0.198 0.438 0.090 0.114 0.378 0.186 0.138 0.120 0.150 0.216 0.120 0.048 0.156 0.114 0.024 0.102 0.060 0.096 0.066 0.114 0.096 0.072 0.150 0.150 0.066

0.155 0.124 0.225 0.272 0.109 0.179 0.117 0.233 0.272 0.086 0.187 0.459 0.140 0.194 0.086 0.132 0.233 0.086 0.047 0.140 0.148 0.016 0.086 0.155 0.187 0.078 0.086 0.093 0.078 0.070 0.132 0.054

0.079 0.194 0.208 0.114 0.174 0.169 0.169 0.050 0.069 0.233 0.169 0.050 0.204 0.134 0.223 0.134 0.089 0.124 0.159 0.104 0.134 0.874 0.139 0.169 0.099 0.154 0.134 0.204 0.243 0.144 0.025 0.179

0.092 0.130 0.158 0.125 0.158 0.201 0.163 0.076 0.163 0.239 0.217 0.027 0.141 0.136 0.130 0.136 0.174 0.082 0.136 0.109 0.125 0.005 0.120 0.212 0.114 0.223 0.168 0.152 0.196 0.158 0.136 0.228

0.325 0.217 0.135 0.108 0.257 0.149 0.068 0.162 0.068 0.284 0.122 0.041 0.081 0.135 0.217 0.068 0.149 0.054 0.122 0.027 0.068 0.027 0.108 0.135 0.068 0.095 0.054 0.135 0.027 0.054 0.176 0.217

0.328 0.344 0.133 0.250 0.219 0.148 0.219 0.102 0.078 0.039 0.094 0.031 0.109 0.078 0.156 0.211 0.125 0.352 0.188 0.227 0.133 0.000 0.172 0.125 0.086 0.063 0.117 0.141 0.031 0.156 0.141 0.023

0.093 0.153 0.212 0.120 0.191 0.207 0.125 0.109 0.000 0.174 0.202 0.027 0.169 0.196 0.131 0.076 0.093 0.142 0.261 0.136 0.158 0.011 0.245 0.087 0.163 0.185 0.142 0.109 0.240 0.131 0.114 0.136

0.145 0.160 0.084 0.145 0.114 0.091 0.206 0.274 0.122 0.084 0.114 0.076 0.099 0.198 0.084 0.183 0.099 0.160 0.168 0.137 0.168 0.008 0.099 0.076 0.114 0.183 0.122 0.122 0.030 0.084 0.099 0.061

0.359 0.167 0.125 0.184 0.184 0.175 0.226 0.217 0.033 0.109 0.150 0.117 0.150 0.134 0.117 0.284 0.058 0.150 0.100 0.175 0.134 0.050 0.159 0.075 0.167 0.084 0.159 0.109 0.067 0.117 0.251 0.125

258 0.166 258 0.166 255 0.164 251 0.162 248 0.160 245 0.158 245 0.158 233 0.150 232 0.149 226 0.145 226 0.145 224 0.144 220 0.142 220 0.142 219 0.141 216 0.139 213 0.137 207 0.133 204 0.131 202 0.130 202 0.130 201 0.129 199 0.128 196 0.126 188 0.121 188 0.121 186 0.120 186 0.120 183 0.118 183 0.118 183 0.118 181 0.116

Sum

52.28 52.10 54.70 53.43 54.71 53.70 53.59 56.24 54.00 53.34 52.66

83613 53.82

*(a) = adjective, (adv) = adverbials, (a.d.) = adverb of degree, (c) = conjunction, (d) = demonstrative, (i) = infinitive, (r) = relative, (p) = prepos (pron) = pronoun

Narrative Style & the Frequencies of Very Common Words

97

earlier studies and brings the number to a comparatively safe (though not allsufficient) working-level.

On the other hand, common homographic forms are tagged to specify each usage: for example, "that" is separated into "that(c)" (conjunctive), "that(r)" (relative), "that(d)" (demonstrative); "her" is tagged to separate possessive adjective, "her(a)", from pronouns, "her(pron)"; verbs like look, reply, etc. are also tagged to distinguish from nouns. For the purpose of counting, furthermore, contractions are expanded (e.g., "can't" counts as "can" and "not"); proper names like "Mr Copperfield," on the other hand, are united with asterisk ("Mr*Copperfield") so as to count as one word. Words that are usually hyphenated or treated as one word but do not appear as one word in my texts are also joined ("for ever" is joined as "for*ever").5

1.3 The 100 most common word-types Table 2 lists the 100 most common word-types whose occurrence in segments of text is frequent enough to allow multivariate analysis.6 While most of the words with higher frequency are function words, several common adjectives and nouns find themselves towards the bottom of the list. What deserves attention is that lemmatisation has enabled seventeen verbs to rank within the top 100 words: be [3], have [8], say [13], look [33], go [45], do [47], take [49], make [51], come [53], see [56], know [78], reply [80], turn [88], get [91], seem [93], sit [95], and think [97] (numbers in square brackets show respective frequency rankings).7 Since the 100 hundred words account for 53.8 % of all the wordtokens in the pure-narrative, it may not be inappropriate to assume that some major determinants of style are reflected in the data.

2. Analysis and Interpretation of the results 2.1 Principal Component Analysis of the 100 words Figure 1 shows the results of a principal component analysis (PCA) of the hundred words in the texts divided into thirty-seven segments. The first step of PCA is to measure the correlations of each of the hundred words with each of the other ninety-nine across the thirty-seven segments of text, using Pearson's Product Moment Formula.8 This procedure generates a matrix of 4450 correlation

98

Tomoji Tabata

Table 3. Pearson's Product-Moment Correlation Matrix

1 the 2 and 3 be 4 of 5a 6 in(p) 7 his 8 have 9 to(i) 10 he ... ... 45 go ... ... 100 never

the and be of a in(p) his have to(i) he . . .

--

-0.076 --

-0.306 0.002 --

0.702 -0.225 -0.279 --

0.035 -0.357 -0.264 0.309 --

0.180 -0.352 0.201 0.552 0.324 --

0.549 -0.127 -0.640 0.357 0.210 -0.003 --

-0.235 -0.028 0.376 -0.281 -0.190 -0.124 -0.474 --

-0.646 -0.073 0.309 -0.575 -0.159 -0.360 -0.441 0.284 --

0.180 -0.316 -0.274 0.032 0.225 -0.160 0.643 -0.104 -0.081 --

...

...

...

...

...

...

...

...

...

...

-0.667 0.383 0.166 -0.717 -0.212 -0.275 -0.475 0.140 0.335 -0.264 . . .

...

...

...

...

...

...

...

...

...

...

-0.517 -0.066 0.539 -0.298 -0.280 0.003 -0.613 0.335 0.454 -0.346 . . .

coefficients. Table 3 gives a part of the matrix. Correlation-coefficients range in value from +1.000, a perfect positive correlation, to -1.000, a perfect negative correlation. A coefficient of 0.000 indicates no correlation whatsoever. The matrix of coefficients reflects similarity or contrast among the hundred words in their "behaviour," or concomitant frequency variation over the thirty-seven text segments. A coefficient of 0.702 between the and of, a comparatively high score allowing for the number of text-samples examined, indicates that where the relative frequency of the runs high, that of of tends to be concomitantly high. A score of -0.717 obtained between of and go provides a contrary example: a text that has pronounced recourse to of tends to show comparatively sparing use of go.

In a small matrix like Table 3, it may be easy to see certain patterns like those I sketched above. However the entire matrix, in which 4450 coefficients are given, makes it practically impossible to grasp complex interrelationships among the hundred words. The next step therefore is to subject the correlation matrix to eigen-analysis. By eigen-analysis, the complex patterns of a correlation coefficient are reduced to a succession of eigen-vectors, and an appropriate weighting, "eigen-value," is assigned to each eigen-vector. The first principal vector, or the first principal component, arrays the coefficients in a sequence

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download