group | no event | stroke | Total |
---|---|---|---|
control | 199 | 28 | 227 |
treatment | 179 | 45 | 224 |
Material based on Chapter 1 Introduction to Modern Statistics and A/B Testing in R
Types of data collection
Types of variables
Tidy data
Summary statistics
Question: does the use of stents reduce the risk of strokes?
451 patients at risk for strokes randomly assigned to 2 groups
treatment receive stent (224)
control no stent (227)
group | no event | stroke | Total |
---|---|---|---|
control | 199 | 28 | 227 |
treatment | 179 | 45 | 224 |
Data stent365
from openintro package.
Proportion who had a stroke in the treatment (stent) group: \(45/(179 +45) = 0.20 = 20\%.\)
Proportion who had a stroke in the control group: \(28/(199 +28) = 0.12 = 12\%.\)
summary statistic is a single number summarizing data from a sample
group | no event | stroke |
---|---|---|
control | 88% | 12% |
treatment | 80% | 20% |
How would you calculate 88% and 80% in the table?
group | no event | stroke |
---|---|---|
control | 88% | 12% |
treatment | 80% | 20% |
A data frame where
explanatory variable → might affect → response variable
associated variables: two variables that show some connection with one another
independent variables: not associated
12 Smoking habits of UK residents. A survey was conducted to study the smoking habits of 1,691 UK residents.
Observational or experimental data collection
What does each row of the data frame represent?
How many participants were included in the survey?
Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.
gender | age | marital_status | highest_qualification | nationality | ethnicity | gross_income | region | smoke | amt_weekends | amt_weekdays | type |
---|---|---|---|---|---|---|---|---|---|---|---|
Male | 38 | Divorced | No Qualification | British | White | 2,600 to 5,200 | The North | No | NA | NA | |
Female | 42 | Single | No Qualification | British | White | Under 2,600 | The North | Yes | 12 | 12 | Packets |
Male | 40 | Married | Degree | English | White | 28,600 to 36,400 | The North | No | NA | NA | |
Female | 40 | Married | Degree | English | White | 10,400 to 15,600 | The North | No | NA | NA | |
Female | 39 | Married | GCSE/O Level | British | White | 2,600 to 5,200 | The North | No | NA | NA | |
Female | 37 | Married | GCSE/O Level | British | White | 15,600 to 20,800 | The North | No | NA | NA |
user_id | cpgn_id | group | open | click | purch | |
---|---|---|---|---|---|---|
1000001 | 1901Email | ctrl | FALSE | 0 | 0 | 0.00 |
1000002 | 1901Email | email_B | TRUE | 1 | 0 | 0.00 |
1000003 | 1901Email | email_A | TRUE | 1 | 1 | 200.51 |
chard | sav_blanc | syrah | cab | past_purch | days_since | visits |
---|---|---|---|---|---|---|
0.00 | 0 | 33.94 | 0.00 | 33.94 | 119 | 11 |
0.00 | 0 | 16.23 | 76.31 | 92.54 | 60 | 3 |
516.39 | 0 | 16.63 | 0.00 | 533.02 | 9 | 9 |
names | var type |
---|---|
user_id | |
cpgn_id | |
group | |
open | |
click | |
purch | |
chard | |
sav_blanc | |
syrah | |
cab | |
past_purch | |
days_since | |
visits |
tidy data
variables
summary statistic