A Corpus Study of Strong and Powerful

[Pages:27]A Corpus Study of Strong and Powerful

Dominic Castello

Master of Arts in Applied Linguistics Module 4 Assignment July 2014

ELAL College of Arts & Law University of Birmingham Edgbaston Birmingham B15 2TT United Kingdom

CL/14/01 Take a small number of words or phrases (between 2 and 5) and carry out a corpus-based study to show how they are used in similar or different ways. Choose words/phrases which are interesting in some way ? e.g. your students often confuse them; they cause problems for translators working with a specific language; you yourself have difficulty deciding when to use one or the other. Examples of words / phrases which have been studied in the past include: between and through; immense, enormous and massive; reason to and reason for; on the other hand and on the contrary. Be sure to reflect critically on your methodology. (Note that you should not repeat these studies, which are mentioned as examples, but should choose different sets of words.)

1

TABLE OF CONTENTS

1.0 Introduction

3

2.0 Literature Review

4

2.1 An introduction to corpus linguistics

4

2.2 Types of corpora

4

2.3 Corpus studies and intuition

5

2.4 Capability and limitations of corpus data

6

2.5 Collocation and semantic preference

7

3.0 Analysis

8

2.1 Methodology

8

2.2 Definitions of strong and powerful

9

2.3 Token frequency

10

2.4 Collocation and patterns

11

2.5 Lexical patterning

14

4.0 Conclusion

17

5.0 References

18

6.0 Appendix

20

2

1.0 Introduction

This paper presents an exploratory corpus study into the terms strong and powerful, and will compare their usage and distribution to identify patterns, similarities and differences in the ways that they are used. Adopting the view that language is, first and foremost a communication tool, polysemous words - lexical items whose senses are identical in respect of `central' semantic traits but not in `peripheral' traits (Cruse 1986, cited in Chung 2011) can present challenges to the language learner in terms of knowing the appropriate contexts in which to use them.

The decision to analyse these particular words came from personal experience working as a language instructor at secondary level. Presently, students in my school are taught English vocabulary mainly as a discrete activity where, in some cases, similar words are presented to them as being perfectly synonymous.

At certain stages of learning, the advantages of this approach are self-evident: it is seldom beneficial to present every sense of a word from the outset. However, as students become more confident with the vocabulary through writing and conversation, naturally they apply the words in a variety of contexts with the understanding that they can be universally substituted.

Returning to my experience, as the extent of the lexical overlap (or indeed the lack thereof) between the words became more apparent to students, questions arose concerning how to differentiate which of them were appropriate in a given context. Which one would best describe a compelling argument, an economically successful or influential nation, or athletes such as bodybuilders? Are certain words or types of words more likely to appear with strong than powerful and vice versa?

The current study aims to use empirical evidence to investigate the patterns of language relating to these words. In doing so it is hoped that the similarities and differences between the terms and the ways in which they are used (that is to say, in context or as part of a semantic group) will be revealed.

3

2.0 Literature review

2.1 An introduction to corpus linguistics

Corpus linguistics is a methodology of linguistic analysis that views `naturally-occurring' language as a credible source for the investigation and classification of linguistic structures (Neselhauff 2011). According to Hanks (2012), corpus linguistics is primarily concerned with interpreting observed language in order to arrive at statements on patterns in word meaning or syntactic composition.

Within this field, a corpus is defined as `a large collection of authentic texts that have been selected and organised following precise linguistic criteria' (Sinclair 1991, 1996; Leech 1991:8, Williams 2003 amongst others). Corpus data is systematic in that its structure and contents will be governed by a number of sampling principles, such as the mode of discourse, subject and variety of language (Neselhauff 2011), while its authenticity stems from the fact that a corpus typically pulls together thousands if not millions of written and/or spoken texts sampled directly from 'maximally representative' examples of language in use (Mcenery and Wilson 1996:87, Dobric 2009:360, Bowker and Pearson 2002:9).

Corpus-driven lexicographers arrive at statements about word meaning or syntactic structures by studying usage, and evaluating the constraints and preferences associated with each word `for what they really are' (Hanks 2012).

2.2 Types of corpora

The earliest electronic corpora were compiled in the 1960s, but it was, according to () the technological advances of the 1970s - most significantly the introduction of digital computers - that meant that for the first time there was sufficient power to collate electronic language databases (Mcenery and Hardie 2013). These could be used as a resource with which to better understand the characteristics of an unprecedented number of source texts. In the 1990s, further developments in corpus design allowed for the capture of discourse in much greater volume. The resulting corpora were considered a rich and varied enough

4

resource for linguists to give the most balanced and accurate reflection of language as it is used everyday (ibid.).

Aside from general corpora, which in their wide scope principally aim to investigate language as a whole (Hunston and Laviosa 2000:80), there are a number of other types of corpora each with a particular focus and purpose. A specialised corpus samples discourse within a specific field (science papers, newspaper editorials or students' essays, for instance); different languages or varieties of language (such as localised or indigenised Englishes) can be compared and contrasted using parallel and comparable corpora respectively; a diachronic corpus that periodically samples a type of discourse can be used to track developments in language over time.

For all types of corpus, examining the behaviour of words as they are found in corpus data allows us to perceive and interpret any structures or patterns that may be contained within the sampled language.

2.3 Corpus studies and intuition

Prior to the developments that led to the creation of such vast corpora, the analysis of prosodic and semantic behaviour was primarily conducted by native speakers who, for the most part, relied on their intuitive personal assessment of what was acceptable for a language to do, and how it was possible for combinations of words to work together (Paikeday 1985, Devitt 2006).

At the time, this `introspective data' was considered to be an adequate, if not unequivocal source of information on linguistic patterns (Xiao and Mcenery 2006, Wasow and Arnold 2005). Kramsch (1997:359) claims that historically, the study of languages has been predicated on the apparent vantage point afforded to the native-speaker. More explicitly, Hunston and Laviosa (2000) cite Chomsky's (1968) advocacy of native-speaker intuition, which he claimed was `sufficient to identify what is acceptable in a language, without the need for further evidence'.

5

However, this position of de facto authority has been called into question in a number of subsequent studies. Researchers and linguistics (including Wasow and Arnold 2006:1484; Haegeman 2006; Devitt 2006) dispute the reliability of native speaker intuition as a means of determining the patterns by which words collocate and how they should be used. Hanks (2012) goes on to assert that although intuition may be used to judge a general sense of well-formedness, meaning or acceptability of a word, a more empirical and data-focused method is required for the formation of sound linguistic hypotheses that can characterize the nature of language.

Of course, the adoption of powerful computer based databases and language analysis tools does not negate the merit of human judgement in corpus-based research. Native speaker intuition may suggest that something is natural or grammatically correct, while corpus evidence can be used to illustrate the commonality of certain possibilities. As Hunston (2000) and Scott and Tribble (2006) note that in any case, intuition is needed to interpret the evidence found in a corpora and reach effective conclusions as to the behaviour of language. Rather than viewing these forms of analysis as being mutually exclusive, Leech (1991:15) finds that the most comprehensive and considered analysis `depends on a division of labour between the corpus and human mind'.

2.4 Capability and limitations of corpus data

Corpus data gives researchers a good chance to infer and conclude the meanings of words from the repeated grammatical patterns as well as the collocation of the words in question. With the help of these large banks of text, it is possible to make well-informed judgments about how words behave in specific contexts and registers.

It should be remembered that however extensive the source material might be, a corpus and its findings serve only as a reflection of that which is selected from the (ultimately finite) data it contains. Hence, the possibility of total linguistic accountability is, according to Basarally (2011), unattainable. In light of this, Leech (in Kennedy 1998: 23) advises caution, noting that `some sentences won't occur because they are obvious, others because they are false, still others because they are impolite'.

6

In addition, corpus data can only help us infer the scope of any underlying meaning that a word possesses and determine whether collocations are plausible, as opposed to whether or not they are grammatically acceptable. As such, observations should not be regarded as a concrete representation of language usage, but as only as reasonable evidence based on what is available for analysis.

In light of this, data mined from the corpus should be examined closely to ensure that results are not distorted by incongruous meanings. This issue will be addressed in the methodology.

2.5 Collocation and semantic preference

Originally coined as a technical term by Firth (1957), the collocation of a given word was said to be `a statement on the habitual or customary placings of [a] word'. This description was later extended by Sinclair (1991), Hoey (1991:6) and Hunston (2002) amongst others to include `lexical items that appear with others with greater than random probability than their individual frequencies would lead us to expect'.

Collocation goes against the notion that meaning is ascribed purely at the level of words (Xiao and Mcenery 2006). This attraction or repulsion of certain words to and from others is an important part of the process of understanding meaning through corpus studies, with a statistical approach providing a quantitative measurement of the relationship between words across the innumerable samples of discourse found in a corpus.

Within the realms of corpus there are a number of ways in which to assign the level of significance between instances of words (Xiao and Mcenery 2006). Of these, the two statistical tests that will be conducted in the current study will be those for t-scores and MI scores.

The t-score reflects the extent to which an occurrence of one word together with another reflects a deviation from what is considered a standard or average level of frequency (Hunston 2002, Walker 2010). The MI score (mutual information score) is a measure of the strength of association between words that compares the expected (or random) and actual

7

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download