Material based on Chapter 1 Introduction to Modern Statistics
Types of data collection
Types of variables
Sampling and representation
Tidy data
Summary statistics and associations
Tip
The tidyverse is a collection of R packages designed for data science!
The Big Question: Do stents reduce the risk of stroke?
The Experiment:
Why Random Assignment?
group | no event | stroke | Total |
---|---|---|---|
control | 199 | 28 | 227 |
treatment | 179 | 45 | 224 |
Data stent365
from openintro package.
Proportion who had a stroke in the treatment (stent) group: \(45/(179 +45) = 0.20 = 20\%.\)
Proportion who had a stroke in the control group: \(28/(199 +28) = 0.12 = 12\%.\)
summary statistic is a single number summarizing data from a sample
group | no event | stroke |
---|---|---|
control | 88% | 12% |
treatment | 80% | 20% |
Data are observations collected from a study or experiment.
Note
Each row is an observation (also called a case)
Each column is a variable
A data frame where
Tip
Why Tidy?
explanatory variable → might affect → response variable
Examples:
Warning
Association ≠ Causation!
A representative sample accurately reflects the characteristics of the population it’s drawn from.
Good Representative Sample:
Poor Representative Sample:
Example: Surveying only online users about internet usage would not represent the entire population
Positive association: As one variable increases, the other tends to increase
A confounding variable affects both the explanatory and response variables, creating a spurious association.
Warning
Ice cream doesn’t cause drowning! Temperature affects both.
Classic Examples:
Observational Studies
Experimental Studies
Random assignment serves critical purposes:
Random Sampling
Random Assignment
Important
You can have one without the other!
12 Smoking habits of UK residents. A survey was conducted to study the smoking habits of 1,691 UK residents.
gender | age | marital_status | highest_qualification | nationality | ethnicity | gross_income | region | smoke | amt_weekends | amt_weekdays | type |
---|---|---|---|---|---|---|---|---|---|---|---|
Male | 38 | Divorced | No Qualification | British | White | 2,600 to 5,200 | The North | No | NA | NA | |
Female | 42 | Single | No Qualification | British | White | Under 2,600 | The North | Yes | 12 | 12 | Packets |
Male | 40 | Married | Degree | English | White | 28,600 to 36,400 | The North | No | NA | NA | |
Female | 40 | Married | Degree | English | White | 10,400 to 15,600 | The North | No | NA | NA | |
Female | 39 | Married | GCSE/O Level | British | White | 2,600 to 5,200 | The North | No | NA | NA | |
Female | 37 | Married | GCSE/O Level | British | White | 15,600 to 20,800 | The North | No | NA | NA |