Advanced Data Analysis from an Elementary Point of View
Advanced Data Analysis from an Elementary Point of View
Cosma Rohilla Shalizi
3
For my parents and in memory of my grandparents
Contents
Introduction
12
Introduction
12
To the Reader
12
Concepts You Should Know
15
Part I Regression and Its Generalizations
17
1 Regression Basics
19
1.1 Statistics, Data Analysis, Regression
19
1.2 Guessing the Value of a Random Variable
20
1.3 The Regression Function
21
1.4 Estimating the Regression Function
25
1.5 Linear Smoothers
30
1.6 Further Reading
41
Exercises
41
2 The Truth about Linear Regression
43
2.1 Optimal Linear Prediction: Multiple Variables
43
2.2 Shifting Distributions, Omitted Variables, and Transformations
48
2.3 Adding Probabilistic Assumptions
57
2.4 Linear Regression Is Not the Philosopher's Stone
60
2.5 Further Reading
61
Exercises
62
3 Model Evaluation
63
3.1 What Are Statistical Models For?
63
3.2 Errors, In and Out of Sample
64
3.3 Over-Fitting and Model Selection
68
3.4 Cross-Validation
72
3.5 Warnings
76
3.6 Further Reading
79
Exercises
80
4 Smoothing in Regression
86
4.1 How Much Should We Smooth?
86
4
15:21 Sunday 21st March, 2021 Copyright c Cosma Rohilla Shalizi; do not distribute without permission updates at
Contents
5
4.2 Adapting to Unknown Roughness
87
4.3 Kernel Regression with Multiple Inputs
94
4.4 Interpreting Smoothers: Plots
96
4.5 Average Predictive Comparisons
97
4.6 Computational Advice: npreg
98
4.7 Further Reading
101
Exercises
102
5 Simulation
115
5.1 What Is a Simulation?
115
5.2 How Do We Simulate Stochastic Models?
116
5.3 Repeating Simulations
120
5.4 Why Simulate?
121
5.5 Further Reading
127
Exercises
127
6 The Bootstrap
128
6.1 Stochastic Models, Uncertainty, Sampling Distributions
128
6.2 The Bootstrap Principle
130
6.3 Resampling
141
6.4 Bootstrapping Regression Models
143
6.5 Bootstrap with Dependent Data
148
6.6 Confidence Bands for Nonparametric Regression
149
6.7 Things Bootstrapping Does Poorly
149
6.8 Which Bootstrap When?
150
6.9 Further Reading
151
Exercises
152
7 Splines
154
7.1 Smoothing by Penalizing Curve Flexibility
154
7.2 Computational Example: Splines for Stock Returns
156
7.3 Basis Functions and Degrees of Freedom
162
7.4 Splines in Multiple Dimensions
164
7.5 Smoothing Splines versus Kernel Regression
165
7.6 Some of the Math Behind Splines
165
7.7 Further Reading
167
Exercises
168
8 Additive Models
170
8.1 Additive Models
170
8.2 Partial Residuals and Back-fitting
171
8.3 The Curse of Dimensionality
174
8.4 Example: California House Prices Revisited
176
8.5 Interaction Terms and Expansions
180
8.6 Closing Modeling Advice
182
8.7 Further Reading
183
Exercises
183
6
Contents
9 Testing Regression Specifications
193
9.1 Testing Functional Forms
193
9.2 Why Use Parametric Models At All?
203
9.3 Further Reading
207
10 Weighting and Variance
208
10.1 Weighted Least Squares
208
10.2 Heteroskedasticity
210
10.3 Estimating Conditional Variance Functions
219
10.4 Re-sampling Residuals with Heteroskedasticity
227
10.5 Local Linear Regression
227
10.6 Further Reading
232
Exercises
233
11 Logistic Regression
234
11.1 Modeling Conditional Probabilities
234
11.2 Logistic Regression
235
11.3 Numerical Optimization of the Likelihood
240
11.4 Generalized Linear and Additive Models
241
11.5 Model Checking
243
11.6 A Toy Example
244
11.7 Weather Forecasting in Snoqualmie Falls
247
11.8 Logistic Regression with More Than Two Classes
259
Exercises
260
12 GLMs and GAMs
262
12.1 Generalized Linear Models and Iterative Least Squares
262
12.2 Generalized Additive Models
268
12.3 Further Reading
268
Exercises
268
13 Trees
269
13.1 Prediction Trees
269
13.2 Regression Trees
272
13.3 Classification Trees
281
13.4 Further Reading
287
Exercises
287
Part II Distributions and Latent Structure
293
14 Density Estimation
295
14.1 Histograms Revisited
295
14.2 "The Fundamental Theorem of Statistics"
296
14.3 Error for Density Estimates
297
14.4 Kernel Density Estimates
300
14.5 Conditional Density Estimation
306
14.6 More on the Expected Log-Likelihood Ratio
307
Contents
7
14.7 Simulating from Density Estimates
310
14.8 Further Reading
315
Exercises
317
15 Principal Components Analysis
319
15.1 Mathematics of Principal Components
319
15.2 Example 1: Cars
326
15.3 Example 2: The United States circa 1977
330
15.4 Latent Semantic Analysis
333
15.5 PCA for Visualization
336
15.6 PCA Cautions
338
15.7 Random Projections
339
15.8 Further Reading
340
Exercises
341
16 Factor Models
344
16.1 From PCA to Factor Models
344
16.2 The Graphical Model
346
16.3 Roots of Factor Analysis in Causal Discovery
349
16.4 Estimation
351
16.5 The Rotation Problem
357
16.6 Factor Analysis as a Predictive Model
358
16.7 Factor Models versus PCA Once More
361
16.8 Examples in R
362
16.9 Reification, and Alternatives to Factor Models
366
16.10 Further Reading
373
Exercises
373
17 Mixture Models
375
17.1 Two Routes to Mixture Models
375
17.2 Estimating Parametric Mixture Models
379
17.3 Non-parametric Mixture Modeling
384
17.4 Worked Computating Example
384
17.5 Further Reading
400
Exercises
401
18 Graphical Models
403
18.1 Conditional Independence and Factor Models
403
18.2 Directed Acyclic Graph (DAG) Models
404
18.3 Conditional Independence and d-Separation
406
18.4 Independence and Information
413
18.5 Examples of DAG Models and Their Uses
415
18.6 Non-DAG Graphical Models
417
18.7 Further Reading
421
Exercises
422
Part III Causal Inference
423
8
Contents
19 Graphical Causal Models
425
19.1 Causation and Counterfactuals
425
19.2 Causal Graphical Models
426
19.3 Conditional Independence and d-Separation Revisited
429
19.4 Further Reading
430
Exercises
432
20 Identifying Causal Effects
433
20.1 Causal Effects, Interventions and Experiments
433
20.2 Identification and Confounding
435
20.3 Identification Strategies
437
20.4 Summary
452
Exercises
453
21 Estimating Causal Effects
455
21.1 Estimators in the Back- and Front- Door Criteria
455
21.2 Instrumental-Variables Estimates
462
21.3 Uncertainty and Inference
464
21.4 Recommendations
464
21.5 Further Reading
465
Exercises
466
22 Discovering Causal Structure
467
22.1 Testing DAGs
468
22.2 Testing Conditional Independence
469
22.3 Faithfulness and Equivalence
470
22.4 Causal Discovery with Known Variables
471
22.5 Software and Examples
476
22.6 Limitations on Consistency of Causal Discovery
482
22.7 Pseudo-code for the SGS Algorithm
482
22.8 Further Reading
483
Exercises
484
Part IV Dependent Data
485
23 Time Series
487
23.1 What Time Series Are
487
23.2 Stationarity
488
23.3 Markov Models
493
23.4 Autoregressive Models
497
23.5 Bootstrapping Time Series
502
23.6 Cross-Validation
504
23.7 Trends and De-Trending
504
23.8 Breaks in Time Series
509
23.9 Time Series with Latent Variables
510
23.10 Longitudinal Data
518
23.11 Multivariate Time Series
518
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- qualitative data analysis
- introduction to data and data analysis may 2016
- introduction to data analysis handbook eric
- data analysis interpretation and presentation
- step by step guide to data analysis
- basic concepts in research and data analysis
- about the tutorial
- advanced data analysis from an elementary point of view
- data analysis process
Related searches
- author s point of view pdf
- point of view practice pdf
- example of data analysis what is data analysis in research
- point of view worksheet pdf
- point of view quiz pdf
- point of view chart pdf
- point of view activity pdf
- point of view examples pdf
- 4th grade point of view worksheets
- point of view 3rd grade
- author s point of view worksheets
- point of view graphic organizer pdf