1 .id



Descriptive Statistics

in Language Learning Research

by

Mohammad Adnan Latief

adnanlatiefs@

University of Pittsburgh

State University of Malang

2010

DESCRIPTIVE STATISTICS

IN LANGUAGE LEARNING RESEARCH

Research involving numerical data requires statistics to analyze the data. When data to be analyzed are gathered from a group of subjects which do not represent a larger group, Descriptive Statistics is used. When data are gathered from a sample drawn from a larger group of population, inferential statistics is applied. Descriptive statistics involves measures of central tendencies, measures of spread, and measures of relative position. Measures of central tendencies include Mean, Median, and mode. Measures of despersion includes range, variance, and standard deviation. Measures of relative position includes Quartile, Decile, Percentile, Interquartile Rank, and Percentile Rank.

Objectives

After reading this chapter readers are expected to be able to

1. Differentiate between Descriptive Statistics and Inferential Statistics

2. Read descriptive statistics measures used in research

3. Apply descriptive statistics to analyze data in their research projects

4. Explain and use Measures of Central Tendencies: Mean, Median, and Mode

5. Explian and use Measures of Dispersion: range, variance, and standard deviation

6. Explain and use Measures of relative position: Quartile, Decile, Percentile, Interquartile Range, and Percentile Rank.

Research involving numerical data requires statistical analysis to draw conclusions. Statistics ranges from very simple basic descriptive statistics to highly complex inferential statistics. Readers interested in studying inferential statistics are recommended to read more Statistics referenes, or take advanced inferential statistics courses. This chaper introduces some concepts of basic descriptive statistics.

Descriptive and Inferential Analysis

These two types of application are differentiated by the generalizaion of its results. Descriptive Statistics is used to analyze data from a particular group of individuals which are not a representative sample of a bigger group. The conclusions are only applied to that particular group, not generalized beyond the group. Classroom Action Research and Research & Development, for example, do not take samples from an accessable poplation for the subjects involved, so to analyze data descriptive statistics is used. Inferential Statistics is used to analyze data from a sample drawn randomly from a bigger accessible population. The conclusions obtained from the sample are generalized into the bigger population from the sample is drawn. In a survey research, the sample is usually drawn from a larger group of population, so analyze data, inferential statistics is applied (Best, J.W., Kahn, J.V. 2003:342).

Descriptive Statistics

Statistical measures used in descriptive statistics include measures of central tendencies, measures of spread, and measures of relative position. Measures of central tendencies include Mean, Median, and Mode. Measures of spread include range, variance, and standard deviation. Measures of relative position include standard scores, percentile rank, and percentile score.

Measures of Central Tendencies

The popular measure in central tendencies is average. People often say the average temperature, the average rainfall, the average number of patients, the average number of students in one class, the average score of students in an English test, etc. Average is technically called Mean, besides Median and Mode which are used in measures of central tendencies (Best, J.W., Kahn, J.V. 2003:344).

Mean is an index or a point that shows the average. It is calculate by summing the individual scores of the sample and then dividing the total sum by the number of individuals in the sample (Borg, W.R., Gall, J.P., Gall, M.D.,1993:148). Mean is very often used as the base from which many other statistical measures are computed. The procedure involves simple steps as follows

1. List all the scores (for example, the scores of the reading Test of 10 students are 100 100 90 90 80 70 60 60 50 50)

2. Sum all the students’ English Reading scores (100 + 100 + 90 + 90 + 80 + 70 + 60 + 60 + 50 + 50 = 750)

3. Divide the sum of the scores by the number of the scores. (Since the number of the students involved is 10 and the sum of all the scores is 750 the mean is calculated as 750/10 = 75. So the mean score of the 10 students’ reading test result is 75).

The problem with Mean occurs when one of the scores is extremely high or extremely low. The Mean becomes too high to represent the group scores when one score is extremely high. Similarly, the Mean becomes too low to represent the group scores when one score is extremely low. For example when the Reading test scores of 10 students are 20 20 25 25 30 30 35 35 35 100, the sum of the scores is 355 when divided by 10 the mean is 35.5. the Mean score 35.5 is not a good representation of the 10 students’ Reading test scores because only 1 student scores above 35.5 and the other 9 students score below the Mean 35.5. This one extreme score does not belong to the group and is called an outlier score. To get a Mean score that objectively represents the scores of a group of students, the outlier score should be excluded from the computation. Without the score 100 the sum of the 9 students’ Reading Test scores is 255 and when devided by 9 students, the Mean is 28.33 This Mean score of 28.33 represents the score of the 9 students’ reading Test objectively because 4 students score below (20 20 25 25) and 5 students score above (30 30 35 35 35).

Median is a statistics that describes a sample’s typical score on a measure. It is the middle score in the distribution of scores (Borg, W.R., Gall, J.P., Gall, M.D.,1993:148) above which and below which one-half of the scores fall. To compute the Median, the scores must be ordered from the lowest to the highest. When the number of scores are odd with no tied scores, the median is the middle score. For example, when the Reading Test scores of 9 students are 20 30 40 50 60 70 80 90 100 the median is 60 because above the point 60 there are 4 scores (70 80 90 100) and below the point 60 there are also 4 scores (50 40 30 20). See Table 5

Table 5

Median of Odd Numbers with No Ties

|Students |Students’ Scores | |

|1 |20 | |

| | |4 scores |

|2 |30 | |

|3 |40 | |

|4 |50 | |

|5 |60 |Median |

|6 |70 | |

| | |4 scores |

|7 |80 | |

|8 |90 | |

|9 |100 | |

When the number of scores are even with no tied scores, the median is the midpoint between the two middle scores. For example when the Reading Test scores of 8 students are 20 30 40 50 60 70 80 90 the median is the midpoint between the score 50 and 60 (= 55) because above the point 55 there are 4 scores (60 70 80 90) and below the point 55 there are also 4 scores (50 40 30 20).

Table 6

Median of Even Numbers with No Ties

|Students |Students’ Scores | |

|1 |20 | |

| | |4 scores |

|2 |30 | |

|3 |40 | |

|4 |50 | |

| |55 |Median |

|5 |60 | |

| | |4 scores |

|6 |70 | |

|7 |80 | |

|8 |90 | |

When there are two tied scores and the median is within the tied scores, then interpolation within the tied scores is necessay. If 8 students’ Reading test scores are 20 20 25 30 30 35 35 40, the Median is the midpoint between the two tied scores 30 and 30. (3 scores fall below 30 and 3 scores fall above 30). Since the score 30 are shared by 2 students (tied), this one score has to be devided into 2 parts, the first part is 29.5-30 and the second part is 30-30.5. So the scores of the 8 students’ Reading Test become 20 20 25 29.5-30 30-30.5 35 35 40. (Each score is underlined to make it clear). The Median is 30 because there are 4 scores above it (30-30.5 35 35 40) and 4 scores below it (30-29.5 25 20 20). See Table 7

Table 7

Median of Even numbers With 2 Tied Scores

|Students |Students’ Scores |Interpolated Scores* | |

|1 |20 |19.50 -- 20.00 | |

| | | |4 scores |

|2 |20 |20.00 -- 20.50 | |

|3 |25 |24.50 – 25.50 | |

|4 |30 |29.50 -- 30.00** | |

| |30.00 -- 30.00 |Median |

|5 |30 |30.00 -- 30.50** | |

| | | |4 scores |

|6 |35 |34.50 -- 35.00 | |

|7 |35 |35.00 -- 35.50 | |

|8 |40 |39.50 -- 40.50 | |

* Interpolation is done by taking the lower limit and the upper limit of each score. **The lower limit-upper limit of score 30 is 29.50-30.50 shared with the 4th student and the 5th student.

When there are 5 tied scores and the median is within the tied scores, the score which is shared by 5 students has to be devided into 5 parts and the Median is determined among the tied scores. If the scores of the 8 students’ Reading Test (ordered from the lowest to the highest) are 20 30 30 30 30 30 35 40, the Medan is among the scores of 30. Since the score 30 is shared by 5 students, the score 30 has to be devided into 5 parts (the score 30 which ranges from the lower limit 29.50 up to the upper limit 30.50 is devided by 5 points, each getting .20 point). So the scores become 20 29.50-29.70 29.70-29.90 29.90-30.10 30.10-30.30 30.30-30.50 35 40. The Median point devides 4 scores below it and 4 scores above it. The Median point is between 29.90-30.10 and 30.10-30.30 So the Median point is 30.10 See Table 8

Table 8

Median of Even numbers With 5 Tied Scores

|Students |Students’ Scores |Interpolated Scores | |

|1 |20 |19.50 -- 20.50 | |

| | | |4 scores |

|2 |30 |29.50 -- 29.70* | |

|3 |30 |29750 – 29.90* | |

|4 |30 |29.90 -- 30.10* | |

| |30.10 -- 30.10 |Median |

|5 |30 |30.10 - 30.30* | |

| | | |4 scores |

|6 |30 |30.30 -- 30.50* | |

|7 |35 |34.50 -- 35.50 | |

|8 |40 |39.50 -- 40.50 | |

* The upper limit-the lower limit of the score 30 is 29.50-30.50 shared by the 2nd, 3rd, 4th, 5th, and 6th students.

Mode is the score that occurs most frequently in a distribution. Mode can be found by counting the frequency of occurance for each score and presented in a hystogram. Table 9 shows that the mode is the score 50 because it occurs 10 times in the distribution.

Measures of Spread

Measures of central tendencies describe location along an ordered scale (Best, J.W., Kahn, J.V. 2003:349). Two groups of students having about the same Means do not necessarily have the same score distributions. Heterogeneity of the scores is not explained by measures of Central tendencies. Measures of spread analyzes the dispersion of the students’ scores, including range, variance, and standard deviation.

Table 9

An Example of Hystogram

|Frequency | |

Range is the difference between the lowest score and the highest score in the same group. When the lowest score of students’ Reading test in one class is 20 and the highest score of the same class is 75, the range is 75-20= 55. When the same class test scores on Speaking has a range of 35, then we can conclude that the scores of the class on Speaking are more homogeneous than their scores on Reading.

Standard Deviation is the average deviation of each score from the mean. When a lot of scores deviate far from their Mean score, the group is very heterogeneous, or the closer each score is different from their Mean score, the more homogeneous the group is. Standard deviation is analyzed by computing the average deviation of each score from the mean. So the procedure involves several steps as follows

1. List all the scores from the highest to the lowest

2. Compute the Mean of the scores

3. Subtract each score from the Mean, the score deviation can be plus when the score is higher than the Mean, 0 when the score equals the Mean, or minus when the score is lower than the Mean

4. Sum the result of all subtractions. The correct sum is always zero because the plus differences are cancelled by the minus differences. In fact we can give an alternative definition of the mean as the value in a distribution around which the sum of the score deviation from the mean equals zero (Best, J.W., Kahn, J.V. 2003:344).

5. To get the average of the score deviations is impossible as the sum of the score deviations equals zero. To avoid zero, square each score deviation to result in all positive deviations.

6. Sum all the positive score deviations.

7. Divide the sum of all score deviations by the number (N) of the scores. The result of the division is called the variance, the sum of the squared deviations from the mean, divided by N (Best, J.W., Kahn, J.V. 2003:350).

8. Square root the variance. The result is the standard deviation, the average deviation of each score from the mean..

Suppose we have a set of Reading test scores of 9 High School students as follows 55 60 65 70 75 80 85 90 95 Compute the mean, subtract each score from the mean, square each score deviation, sum all the squared score deviatios, divide the result with 9. (See Table 10).

Table 10

Computation of Standard deviation

_______________________________________________________

N Scores (X) Mean (X-Mean) (X-Mean)2

______________________________________________________

9 95 75 +20 400

8 90 75 +15 225

7 85 75 +10 100

6 80 75 +5 25

5 75 75 0 0

4 70 75 -5 25

3 65 75 -10 100

2 60 75 -15 225

1 55 75 -20 400

__________________________________________________________

Total 675 0 1500

675/9= 75

1500/9 = 166.67 = variance

Square root of 166.67 = 12.91 = Standard deviation

(See Best, J.W., Kahn, J.V. 2003:350).

Measures of Relative Position

Median is a point that cuts off the buttom 50 % of the scores from the top 50 % of the scores. Statistics also has many other points that cut off a set of scores into certain other % of bottom scores and high scores. When we use the unit of 25 %, 50 %, and 75 % between the low scores and the high scores to be cut off, we use quartile measures. When we use every unit of 10 % of low scores and high top scores to be cut off, we use Decile measures. When we use every single % of low bottom scores and high top scores to be cut off, we use Percentile measures.

In quartile measures, there are 3 quartiles; Quartile One, Quartile Two, and Quartile three. Quartile One (Q1) is an index that cuts off the lowest 25 % of the scores from the top 75 % of the scores. Quartile Two (Q2) is an index that cuts off the bottom 50 % of the scores from the top 50 % the scores. So Q2 equals Median. Quartile Three (Q3) is an index that cuts off the bottom 75 % of the scores from the top 25 % of the scores.

In Decile measures, there are 9 deciles, Decile One (D1), Decile Two (D2), Decile three (D3) up to Decile Nine (D9). Decile One (D1) is an index that cuts of the bottom 10 % of the scores from the top 90 % of the scores. Other deciles can be listed in Table 11.

Table 11

Decile Measures

| | |Top Scores |Bottom Scores |

| | | | |

| | | | |

| | | | |

| |A point that cuts off between | | |

|D9 | |10 % |90 % |

|D8 | |20 % |80 % |

|D7 | |30 % |70 % |

|D6 | |40 % |60 % |

|D5* | |50 % |50 % |

|D4 | |60 % |40 % |

|D3 | |70 % |30 % |

|D2 | |80 % |20 % |

* D5 = Q2 = Median

In Percentile measures, there are 99 percentiles, Percentile One (P1), Percentile Two (P2), Percentile Three (P3) up to Percentile Ninety Nine (P99). Percentile One (P1) is an index that cuts of the bottom 1 % of the scores from the top 99 % the scores. Other Percentiles can be listed in Table 12

Interquartile Range

Previosly range has been defined as the difference between the highest and the lowest score in a set of score distribution. Using Quartile measures, we can also define range as the difference between the Q3 and Q1, which is called interquartile range. So, if Q1 is, for example, the score 75 and Q3 is the score 45, then the interquartile range is 75-45= 30. When we divide 30 by 2, the result is semi-interquartile range.

Table 12

Percentile Measures

| | |Top Scores |Bottom Scores |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| |A point that cuts off between | | |

|P98 | |2 % |98 % |

|P97 | |3 % |97 % |

|P96 | |4 % |96 % |

|P95 | |5 % |95 % |

|P94 | |6 % |94 % |

|P93 | |7 % |93 % |

|P92 | |8 % |92 % |

|P91 | |9 % |91 % |

|P90 | |10 % |90 % |

|…. | |……… |……….. |

|P75*** | |25 % |75 % |

|P50** | |50 % |50 % |

|P25* | |75 % |25 % |

|P05 | |95 % |05 % |

|P02 | |98 % |02 % |

* P25 = Q1 ** P50= Q2= Median ***P75 = Q3

Percentile Rank

Percentile measures show an index that cuts off between certain % of bottom scores and % of top scores. P1 is a point that cuts off the bottom 1 % of the scores from the top 99 % of the scores. P10 is a point that cuts off the bottom 10 % of the scores from the top 90 % of the scores. P90 is the point that cuts off the bottom 90% of the scores from the top 10% of the scores. When P90 is the score 85, it means that the score 85 cuts off the top 10 % of scores from the bottom 90 % of the scores. It means that 90 % of the scores fall below the score 85 and 10 % of the scores fall above the score 85. The score 85 in this distribution then has the Percentile Rank 90. To give another example, P75 is a point that cuts off the top 25% of the scores from the lower 75% of the bottom scores. When P75 is the score 70, it means that the score 75 cuts off the top 25% of the scores from the lower 75% of the scores, or 75% of the score fall below the score 70 and 25% of the scores fall above the score 70. It also means that the score 70 in this distribution has the Percentile Rank 75.

Percentile measures can be computed and presented in a diagram. Let’s suppose we have a set of 20 sores representing the result of a vocabulary test as follows.

9 17 20 22 17 14 20 15 25 13

21 26 25 21 22 14 21 22 10 12

1. List the scores from the lowest up to the highest, ignoring the ties

2. Tally the frequency of occurance of each score

3. Accumulate the frequency of occurance from the lowest to the highest scores

4. Change the accumulative frequency into percentage to make relative cumulative frequency ( r ). Percentage is calculated by dividing the number of the scores with the total number of the scores multiplied with 100%.

5. Present the relative cumulative frequency in a diagram

Table 13

Relative Cumulative Frequency

|Scores |Frequency |Cumulative (cf) |Relative |

|(X) |(f) | |(r.c.f) |

|26 |1 |20 |100% |1.00 |

|25 |2 |19 |90% |.90 |

|22 |3 |17 |75% |.75 |

|21 |3 |14 |70% |.70 |

|20 |2 |11 |55% |.55 |

|17 |2 |9 |45% |.45 |

|15 |1 |7 |35% |.35 |

|14 |2 |6 |30% |.30 |

|13 |1 |4 |20% |.20 |

|12 |1 |3 |15% |.15 |

|10 |1 |2 |10% |.10 |

|9 |1 |1 |5% |.05 |

|N |20 | | | |

Table 14

Ogyve of Percentiles

|r.c.f. | |

|1.00 | |

From Table 14 we can determine the score for each Percentile (relative cumulative frequency) and determine the Percentile Rank of each score. P30 for example, (a point that cuts off the bottom 30% of the scores from the top 70% of the scores) is the score 14. On the contrary the score 14 has the PR 30, which means that this score is higher than 30% of other scores or 30% scores fall below 14. Similarly the score 21 has the PR 70 which means that this score is better than 70% of all the scores in this distribution, or 70% of the scores fall below 21.

Concluding Remark

Measures of Central Tendencies, Measures of Dispersion, and Measures of Relative Position is among the basic statistical concepts and formula for descriptive analysis. Researchers conducting correlational research, causal comparative research, and more specifically experimental research have to learn more on inferential statistics.

Comprehension Questions

Answer the following question and explain your answer or do what is required.

1. What is the difference between descriptive statistics and inferential statistics?

2. What is the difference between the measures of Central Tendencies, Mean, Median, and Mode.

3. How do you define range? What is good for?

4. What is variance? What is the procedure in calculating variance?

5. What is standard deviation? What is the procedure in calculating standard deviation?

6. What does Quartile Three (Q3) show?

7. What is semi-interquartile range? What is the procedure in calculating semi-interquartile range?

8. What does Decile Seven (D 7) show?

9. What does Percentile 45 (P45) show?

10. When your score is 65 and your PR is 70, what does it mean?

References

Best, J.W., Khan, J.V. (2003) Research in Education (9th Ed.) Boston: Pearson Education Inc.

Borg, W.R., Gall, J.P., Gall, M.D. (1993) Applying Educational Research: A Practical Guide (3rd Ed.) New York: Addison Wesley Longman, Inc.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download