Advanced Data Analysis from an Elementary Point of View

Advanced Data Analysis from an Elementary Point of View

Cosma Rohilla Shalizi

3

For my parents and in memory of my grandparents

Contents

Introduction

12

Introduction

12

To the Reader

12

Concepts You Should Know

15

Part I Regression and Its Generalizations

17

1 Regression Basics

19

1.1 Statistics, Data Analysis, Regression

19

1.2 Guessing the Value of a Random Variable

20

1.3 The Regression Function

21

1.4 Estimating the Regression Function

25

1.5 Linear Smoothers

30

1.6 Further Reading

41

Exercises

41

2 The Truth about Linear Regression

43

2.1 Optimal Linear Prediction: Multiple Variables

43

2.2 Shifting Distributions, Omitted Variables, and Transformations

48

2.3 Adding Probabilistic Assumptions

57

2.4 Linear Regression Is Not the Philosopher's Stone

60

2.5 Further Reading

61

Exercises

62

3 Model Evaluation

63

3.1 What Are Statistical Models For?

63

3.2 Errors, In and Out of Sample

64

3.3 Over-Fitting and Model Selection

68

3.4 Cross-Validation

72

3.5 Warnings

76

3.6 Further Reading

79

Exercises

80

4 Smoothing in Regression

86

4.1 How Much Should We Smooth?

86

4

15:21 Sunday 21st March, 2021 Copyright c Cosma Rohilla Shalizi; do not distribute without permission updates at

Contents

5

4.2 Adapting to Unknown Roughness

87

4.3 Kernel Regression with Multiple Inputs

94

4.4 Interpreting Smoothers: Plots

96

4.5 Average Predictive Comparisons

97

4.6 Computational Advice: npreg

98

4.7 Further Reading

101

Exercises

102

5 Simulation

115

5.1 What Is a Simulation?

115

5.2 How Do We Simulate Stochastic Models?

116

5.3 Repeating Simulations

120

5.4 Why Simulate?

121

5.5 Further Reading

127

Exercises

127

6 The Bootstrap

128

6.1 Stochastic Models, Uncertainty, Sampling Distributions

128

6.2 The Bootstrap Principle

130

6.3 Resampling

141

6.4 Bootstrapping Regression Models

143

6.5 Bootstrap with Dependent Data

148

6.6 Confidence Bands for Nonparametric Regression

149

6.7 Things Bootstrapping Does Poorly

149

6.8 Which Bootstrap When?

150

6.9 Further Reading

151

Exercises

152

7 Splines

154

7.1 Smoothing by Penalizing Curve Flexibility

154

7.2 Computational Example: Splines for Stock Returns

156

7.3 Basis Functions and Degrees of Freedom

162

7.4 Splines in Multiple Dimensions

164

7.5 Smoothing Splines versus Kernel Regression

165

7.6 Some of the Math Behind Splines

165

7.7 Further Reading

167

Exercises

168

8 Additive Models

170

8.1 Additive Models

170

8.2 Partial Residuals and Back-fitting

171

8.3 The Curse of Dimensionality

174

8.4 Example: California House Prices Revisited

176

8.5 Interaction Terms and Expansions

180

8.6 Closing Modeling Advice

182

8.7 Further Reading

183

Exercises

183

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download