Sample exploratory data analysis - Courses



Sample exploratory data analysis Info Sys 271

Why are we doing this?

Part of doing an exploratory data analysis is calculating lots of statistics and doing lots of graphs that might not make it into your final write-up. Graphs and statistics that aren’t directly used in the report can be included in appendices to your report. Part of the reason we do exploratory data analysis is so we can justify the methods we use in further analysis. Some statistical methods you will encounter require that data has certain characteristics such as normal distribution (e.g. not too skewed or bi-modal). Exploratory data analysis is also an opportunity to look for interesting features of the data that enable us to form hypotheses (e.g. relationships which might allow us to predict one variable from another). Exploratory data analysis is not a tool for making conclusions (supporting or rejecting hypotheses) because it doesn’t tell us whether the trends we think we can see are actually statistically significant. An exploratory data analysis write-up should be short!

What are we doing?

Numerical methods

Calculate means and standard deviations for data. Sometimes you might want to calculate medians and modes if you suspect the data is not normally distributed. The mode should be used for nominal level data. The median should be used for ordinal level data. Calculate values for skewness and kurtosis. Where we have nominal level variables we might be particularly interested in calculating means and standard deviations for each value of the nominal data. This will help us work out whether there is a difference between each of the categories. For example we should calculate values for each nominal value of who picked the stocks (the values are pros, darts, djia).

|Descriptive Statistics |

| |

| |PROS |DARTS |DJIA |

|PROS |Pearson Correlation |1.000 |.324(**) |.538(**) |

|DARTS |Pearson Correlation |.324(**) |1.000 |.428(**) |

|DJIA |Pearson Correlation |.538(**) |.428(**) |1.000 |

|** Correlation is significant at the 0.01 level (2-tailed). |

We can say:

The performance of the Pros’ stocks is moderately strongly positively correlated with the performance of the DJIA stocks (a Pearson correlation of 0.538). The performance of the Darts stocks is weakly positively correlated with the performance of the Pros stocks (a Pearson Correlation of 0.324).

See this scatterplot from to see a single scatterplot of rainfall vs the number or rainy days.

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download