More dplyr (package in the tidyverse)
dplyr
dplyr
is a powerful R package for data manipulation.
Provides a coherent set of verbs (functions) to help you resolve most data manipulation challenges.
Part of the tidyverse
collection of R packages.
Simplifies data manipulation tasks.
Intuitive syntax and function names.
Today, we’ll focus on the join
functions
dplyr
function() |
Action |
---|---|
glimpse() |
get a glimpse of your data |
count() |
count the unique values of one or more variables |
filter() |
picks rows based on their values |
mutate() |
creates new variables (columns) |
select() |
picks variables (columns) |
summarize() |
reduces multiple values down to a single statistic |
arrange() |
changes the order of the rows based on their values |
group_by() |
create subsets of data to apply functions to |
Join functions combine data from two data frames (tables)
Based on a common column (key)
Different types of joins return different subsets of data
function() |
Action |
---|---|
left_join(x,y) |
all rows from x; all columns from x and y |
right_join(x,y) |
all rows from y; all columns from x and y |
inner_join(x,y) |
only rows from x and y that have the same column names in x and y |
full_join(x,y) |
all rows from x and y; all columns in x and y |
semi_join(x,y) |
all rows from x that have matching column names in y (only x columns) |
anti_join(x,y) |
all rows from x that don’t NOT matches in y (only x columns) |
Creating products_df in tribble format
Creating sales_df in tribble format
left_join
right_join
All rows from the right data frame (sales_df) and adds the matching rows from the left data frame (products_df)
All columns
inner_join
All rows from the left data frame (products_df) that have matching columns in the right data frame (sales_df)
All columns
semi_join
All rows from the left data frame (products_df) that have matching columns in the right data frame (sales_df)
Only columns in left data frame (products_df)
anti_join
All rows from the left data frame (products_df) that do NOT have matching columns in the right data frame (sales_df)
Only columns in left data frame (products_df)
Create 2 data frames employees and departments