Exploratory Data Analysis (Wilks Ch. 3)
[Pages:26]Exploratory Data Analysis (Wilks Ch. 3)
Robustness Numerical Summaries Graphical Summaries Correlation Higher-Dimensional Data
Debra Baker
AOSC 630: Class #2 January 30, 2008
From:
1
A good analysis method is insensitive to the assumptions about the data set.
From:
Common assumptions: "normal" distribution
Robust: performs reasonably well for most types of data
Resistant: not unduly influenced by a small number of outliers
2
There are three key features used to numerically describe a data set.
Location: the central tendency of the data set. Spread: the dispersion of the data set around a central value. Symmetry: how the data is distributed about the central value.
From:
3
The first common numerical summary of a data set is a measure of its location.
Mean: the average of all data points
Median: the center value in an ordered data set
Mode: the most frequently occurring value
Which of these measures are robust?
Which of these measures are resistant?
From: mean-vs-median-.html
4
Quartiles divide the data set into four equal parts to describe its distribution.
First quartile: the middle of the data between the median and minimum.
Third quartile: the middle of the data between the median and maximum.
Are quartiles robust and resistant?
Quartiles are an example of a quantile,which can be based on any divisor (e.g., 10%).
From:
5
The second common numerical summary of a data set is a measure of its spread.
Standard Deviation: the square root of the averaged square distance between data points and the mean.
Interquartile Range: specifies the range of the center 50% of the data.
s=
"( ) 1 n
n ! 1 i=1
x1 ! x
2
Are these measures robust?
Are these measures resistant?
IQR = q0.75 ! q0.25
Equations 3.5 and 3.6 from Wilks (2006), pp. 26-27.
6
The third common numerical summary of a data set is a measure of its symmetry.
Positive Skewness: distribution has a long right tail. Negative Skewness: distribution has a long left tail.
Positive Kurtosis: distribution has a tall narrow peak. Negative Kurtosis: distribution has flat low peak.
From:
7
There are two important measures of skewness.
Skewness Coefficient: a moments-based measure of symmetry.
Yule-Kendall Index: compares the distance between the median and each of the two quartiles.
Are these measures robust?
Are these measures resistant?
#( ) 1 n
! = n " 1 i=1
xi " x
3
s3
( ) ( ) ! YK =
q0.75 " q0.5 " q0.5 " q0.25 IQR
Equations 3.9 and 3.10 from Wilks (2006), p. 28.
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- qualitative data analysis
- exploratory data analysis wilks ch 3
- chapter 6 data analysis and interpretation 6 1
- qualitative methods coding data analysis
- introduction to data analysis handbook
- program evaluation planning data analysis
- module 4 data analysis and presentation
- data analysis interpretation and
- short story analysis—answer the following questions for
- application of time series analysis and forecasting for
Related searches
- data analysis questions examples
- data analysis research paper example
- data analysis method
- data analysis methods examples
- data analysis methods in research
- types of data analysis methods
- data analysis in research methodology
- data analysis in research pdf
- examples of data analysis paper
- data analysis techniques for research
- data analysis and interpretation pdf
- data analysis tools