Types of Data Descriptive Statistics

嚜燈utline of Topics

Statistical Methods I

I. Descriptive Statistics

II. Hypothesis Testing

III. Parametric Statistical Tests

IV Nonparametric Statistical Tests

IV.

V. Correlation and Regression

Tamekia L. Jones, Ph.D.

(tjones@cog.ufl.edu)

Research Assistant Professor

Children*s Oncology Group Statistics & Data Center

Department of Biostatistics

Colleges of Medicine and Public Health & Health

Professions

2

Types of Data

Descriptive Statistics

? Nominal Data

? Descriptive statistical measurements are used

in medical literature to summarize data or

describe the attributes of a set of data

每 Gender: Male, Female

? Ordinal Data

每 Strongly disagree, Disagree, Slightly disagree,

Neutral,, Slightly

g y agree,

g , Agree,

g , Strongly

g y agree

g

? Nominal data 每 summarize using

rates/proportions.

/

i

每 e.g. % males, % females on a clinical study

? Interval Data

Can also be used for Ordinal data

每 Numeric data: Birth weight

3

4

1

Descriptive Statistics (contd)

Measures of Central Tendency

? Summary Statistics that describe the

location of the center of a distribution of

numerical or ordinal measurements where

? Two parameters used most frequently in

clinical medicine

每 Measures of Central Tendency

每 Measures of Dispersion

- A distribution consists of values of a characteristic

and the frequency of their occurrence

每 Example: Serum Cholesterol levels (mmol/L)

6.8 5.1

6.1 4.4

5.0

7.1 5.5

3.8 4.4

5

6

Measures of Central Tendency (contd)

Measures of Central Tendency (contd)

Mean (Arithmetic Average)

Mean 每 used for numerical data and for

symmetric distributions

Median 每 used for ordinal data or for

numerical data where the distribution is

skewed

Mode 每 used primarily for multimodal

distributions

? Sensitive to extreme observations

7

? Replace 5.5 with, say, 12.0

The new mean = 54.7 / 9 = 6.08

8

2

Measures of Central Tendency (contd)

Measures of Central Tendency (contd)

Mode

Median (Positional Average)

? Middle observation: ? the values are less than and half the values

are greater than this observation

? Order the observations from smallest to largest

3.8 4.4 4.4 5.0 5.1 5.5 6.1 6.8

? The observation that occurs most frequently in the data

? Example: 3.8

Mode = 4.4

4.4

4.4

5.0

5.1

5.5

6.1

6.8

7.1

? Example: 3.8

Mode = 4.4; 6.1

4.4

4.4

5.0

5.1

5.5

6.1

6.1

7.1

7.1

? Median = middle observation = 5.1

? Less Sensitive to extreme observations

? Replace 5.5 with say 12.0

? New Median = 5.1

? Two modes 每 Bimodal distribution

9

10

Measures of Central Tendency (contd)

Measures of Central Tendency (contd)

Shape of the distribution

? Symmetric

Which measure do I use?

Depends on two factors:

1. Scale of measurement (ordinal or

numerical)) and

2. Shape of the Distribution of Observations

? Skewed to the Left (Negative)

? Skewed to the Right (Positive)

11

12

3

Measures of Dispersion

Measures of Dispersion (contd)

? Measures that describe the spread or variation in

the observations

? Common measures of dispersion

?

?

?

?

?

Range

Standard Deviation

Coefficient of Variation

Percentiles

Inter-quartile Range

Range = difference between the largest and the

smallest

ll t observation

b

ti

? Used with numerical data to emphasize

extreme values

? Serum cholesterol example

Minimum = 3.8, Maximum = 7.1

Range = 7.1 每 3.8 = 3.3

13

14

Measures of Dispersion (contd)

Measures of Dispersion (contd)

Standard Deviation

Standard Deviation

每 Measure of the spread of the observations about the mean

6.8

5.1

Mean = 5.35

6.1

4.4

5.0

7.1

5.5

3.8

4.4

n=9

每 Used as a measure of dispersion when the mean is used to

measure central tendency for symmetric numerical data

每 Standard

St d d deviation

d i ti like

lik the

th mean requires

i numerical

i l data

d t

每 Essential part of many statistical tests

每 Variance = s2

15

16

4

Measures of Dispersion (contd)

Measures of Dispersion (contd)

Coefficient of Variation

? Measure of the relative spread in data

? Used to compare variability between two numerical data

measured

d on different

diff

scales

l

? Coefficient of Variation (C of V) = (s / mean) x 100%

If the observations have a Bell-Shaped

Di ib i

Distribution,

th the

then

th following

f ll i is

i always

l

true

t

-

67% of the observations lie between X ?1s and X ?1s

? Example:

95% of the observations lie between X ? 2s and X ? 2s

99.7% of the observations lie between X ? 3s and X ? 3s

Mean

Std Dev (s)

Serum Cholesterol ((mmol/L))

5.35

1.126

Change in vessel diameter (mm)

0.12

0.29

C of V

The Normal (Gaussian) Distribution

17

18

Measures of Dispersion (contd)

Measures of Dispersion (contd)

Coefficient of Variation

? Measure of the relative spread in data

? Used to compare variability between two numerical data

measuredd on different

diff

scales

l

? Coefficient of Variation (C of V) = (s / mean) x 100%

e.g. DiMaio et al evaluated the use of the test measuring maternal

serum alphafetoprotein (for screening neural tube defects), in a

prospective study of 34,000 women.

Reproducibility of the test procedure was determined by

repeating the assay 10 times in each of four pools of serum. Mean

and s of the 10 assays were calculated in each of the 4 pools.

Coeffs of Variation were computed for each pool: 7.4%, 5.8%,

2 7% and 22.4%.

2.7%,

4% These values indicate relatively good

reproducibility of the assay, because the variation as measured by

the std deviation, is small relative to the mean. Hence readers of

their article can be confident that the assay results were

consistent.

? Example:

Mean

Std Dev (s)

Serum Cholesterol ((mmol/L))

5.35

1.126

C of V

21%

Change in vessel diameter (mm)

0.12

0.29

241.7%

? Relative variation in Change in Vessel Diameter is more

than 10 times greater than that for Serum Cholesterol

19

20

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download