This week covers:
- More wrangling and plotting in R
- Statistical inference
- Regression
- Overfitting / generalization
- Review combine_and_reshape_in_r.ipynb on joins with dplyr and reshaping with tidyr
- Finish up the Citibike plotting exercises in plot_trips.R, including the plots that involve reshaping data
- Read chapters 12 and 13 of R for Data Science on tidyr and joins
- Do the following exercises from R for Data Science:
- Do part 1 of Datacamp's Cleaning Data in R tutorial
- Additional references:
- The tidyr vignette on tidy data
- The dplyr vignette on two-table verbs for joins
- A visual guide to joins
- Read Chapter 27 of R for Data Science on Rmarkdown
- Do the following exercises:
- See the Statistical Inference & Hypothesis Testing slides
- Review the "Estimating a proportion" section of the statistical inference Rmarkdown file (preview the output here)
- Read Chapter 4 of an Introduction to Statistical Thinking (With R, Without Calculus) IST and do questions 4.1 and 4.2. Feel free to execute code in the book along the way.
- Read Chapter 6 of IST on the normal distribution and do question 6.1
- Chapter 1 of the online textbook Intro to Stat with Randomization and Simulation (ISRS)
- Interactive demos:
- Some notes on expected values and variance, with proofs of their properties
- Expected value, click through on "linearity of expectation" for proof
- Variance
- Read Chapter 7 of IST on sampling distributions and do exercise 7.1
- Read Chapter 9 of IST and do exercise 9.1
- Read Chapter 10 of IST and do exercises 10.1 and 10.2
- Review the "Hypothesis testing" section of the statistical inference Rmarkdown file (preview the output here)
- Also check out the this analysis of the color distribution of M&Ms that we discussed
- See the relevant part of these lecture notes on statistics by simulation
- Statistics for Hackers by VanderPlas (slides, video)
- See section 4 of Mindless Statistics and this article for some warnings on misinterpretations of p-values
- See this post and the related lecture notes on effect sizes and the replication crisis
- See this notebook on statistical vs. practical significance
- There's also an interactive version, play with it and see if you understand what's going on!
- Read Chapter 2 of the online textbook Intro to Stat with Randomization and Simulation (ISRS) and do exercises 2.2 and 2.6
- Read Sections 3.1 and 3.2 of ISRS
- Do exercise 9.2 in IST
- Understanding Statistical Power and Significance Testing
- Calculating the power of a test
- The American Statistical Association's statement on p-values by Wasserstein & Lazar
- Inference by eye by Cumming and Finch
- Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations by Greenland et al.
- The Insignificance of Significance Testing by Johnson
- The Insignificance of Null Hypothesis Significance Testing by Gill
- Why Most Published Research Findings Are False
- Felix Schönbrodt's blog post and shiny app on misconceptions about p-values and false discoveries
- Interpreting Cohen's d effect size
- The New Statistics: Why and How by Cummings
- A guide on effect sizes and related blog post
- Review the slides we covered in class
- See this shiny app on model fitting and this tool for visualing least squares (Dan's version here is similar, but requires Flash)
- Read Chapter 5 of Intro to Stats with Randomization and Simulation, do exercises 5.20 and 5.29
- Read Section 3.1 of Intro to Statistical Learning, do Lab 3.6.2
- See the notebook on linear models with the
modelr
from the tidyverse and this one on model evaluation
- Detailed notes on derivations for ordinary least squares regression with multiple predictors
- Chapter 14 of Introduction to Statistical Thinking
- Formula syntax in R
- The "Model Basics" and "Model Building" Chapters in R for Data Science (Chapters 18 and 19 in the print edition, Chapters 23 and 24 online)
- The modelr and tidymodels packages in R
- An animation of gradient descent and a related blog post