Analytics mindset case studies

ETL in R, part 3

Authors

modified by E Stanny

Deliverable 6: Preparing to join Swift and States Data Sets

Objective: Prepare to join Swift data with states data to include a state column based on ZIP codes.

Import Data: Import the ey_swift.csv file into your R environment using the read_csv() function. Name the dataframe swift.
- swift <- read_csv("http://bus320-quarto.netlify.app/data/ey_swift.csv")
- Delete product “CASH COUNTDOWN 475” and assign output to df_clean
- df_clean <- swift |> filter(product_name != "CASH COUNTDOWN 475")
Import Data: Import the ey_states.csv file into your R environment using the read_csv() function. Name the resultant dataframe df_states.
Examine df_states:
- Use the glimpse() function to review its structure.
- Examine column-level statistics with the skim() function.
- select and arrange() the postal_code column in both ascending and descending orders
Examine df_clean:
- Use the glimpse() function to review its structure.
- Examine column-level statistics with the skim() function.
- select and arrange() the zip column in ascending and descending order.
Question Response: Write a few sentences detailing any potential challenges you might face when merging these two dataframes.

Use mutate() and as.character() to convert zip in df_clean to character type, saving as zip save output to df_clean.
Use select() and skim() to examine zip in df_clean.

Modify the postal_code column in df_states. Use mutate() and str_sub() to remove “+four codes” and ‘-’ in ensuring postal_code retains only the initial five characters. Call the new variable zip.
Use select() and skim() to examine zip in df_states.

Use select() and arrange in descending order to confirm the removal of redundant characters from the zip column in df_states.

Use select() to keep only zip and state_province with zip positioned first.
Show df_states

Write the modified dataframes df_states and df_clean to .csv files, saving them in the “posts” directory.
- First install the package here
- write_csv(df_states, here::here("posts", "df_states.csv"))
- write_csv(df_clean, here::here("posts", "df_clean.csv"))

Note: After completing these steps, you’ll be ready to join these datasets by zip.