based on IMS Ch 5: Exploring numerical data
types of numerical data
why visualize data
loan50
is part of the openintro
packange
automatically loaded when you call the library command
Rows: 50
Columns: 18
$ state <fct> NJ, CA, SC, CA, OH, IN, NY, MO, FL, FL, MD, HI…
$ emp_length <dbl> 3, 10, NA, 0, 4, 6, 2, 10, 6, 3, 8, 10, 10, 2,…
$ term <dbl> 60, 36, 36, 36, 60, 36, 36, 36, 60, 60, 36, 36…
$ homeownership <fct> rent, rent, mortgage, rent, mortgage, mortgage…
$ annual_income <dbl> 59000, 60000, 75000, 75000, 254000, 67000, 288…
$ verified_income <fct> Not Verified, Not Verified, Verified, Not Veri…
$ debt_to_income <dbl> 0.55752542, 1.30568333, 1.05628000, 0.57434667…
$ total_credit_limit <int> 95131, 51929, 301373, 59890, 422619, 349825, 1…
$ total_credit_utilized <int> 32894, 78341, 79221, 43076, 60490, 72162, 2872…
$ num_cc_carrying_balance <int> 8, 2, 14, 10, 2, 4, 1, 3, 10, 4, 3, 4, 3, 2, 3…
$ loan_purpose <fct> debt_consolidation, credit_card, debt_consolid…
$ loan_amount <int> 22000, 6000, 25000, 6000, 25000, 6400, 3000, 1…
$ grade <fct> B, B, E, B, B, B, D, A, A, C, D, A, A, A, A, E…
$ interest_rate <dbl> 10.90, 9.92, 26.30, 9.92, 9.43, 9.92, 17.09, 6…
$ public_record_bankrupt <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
$ loan_status <fct> Current, Current, Current, Current, Current, C…
$ has_second_income <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
$ total_income <dbl> 59000, 60000, 75000, 75000, 254000, 67000, 288…
data = loan50
x-axis = total_income
y-axis = loan_amount
geom_point
: create scatterplot
?geom_point()
aesthetic optionsggplot(loan50, aes(x = total_income, y = loan_amount)) +
geom_point(size = 3, color = "blue") +
scale_x_continuous(labels = label_dollar(scale = 0.001, suffix = "K")) +
scale_y_continuous(labels = label_dollar(scale = 0.001, suffix = "K")) +
labs(x = "Total income", y = "Loan amount", title = "Scatterplot of loan amount and total income")
Create a scatterplot of loan_amount
on the x-axis and interest_rate
on the y-axis
Label the x axis in dollars
Label the y axis in percent (hint: use label_percent
with scale = 1)
Add a title, label the x and y axes
interest_rate
Create a dot plot of loan_amount
Add the median to the plot
Add labs
Label the loan amount on the x axis in thousands of dollars
Longer tail left (left skewed)
Longer tail right (right skewed)
Equal both sides (symmetric)
Identify which plot is symmetric, left-skewed, and right skewed.
Create a histogram of loan_amount
Add the median to the plot
Add labs
Label the loan amount on the x axis in thousands of dollars
Create a density of loan_amount
Add the median to the plot
Add labs
Label the loan amount on the x axis in thousands of dollars
Prominent peak
unimodal
bimodal
multimodal
Create a boxplot of loan_amount
Add labs
Label the loan amount on the x axis to be thousands of dollars
types of numerical data
plots for 1 variable
plots for 2 variables
openintro
package