Analytics mindset case studies
ETL in R, part 3
Deliverable 6: Preparing to join Swift and States Data Sets
Objective: Prepare to join Swift data with states data to include a state column based on ZIP codes.
a. Data Loading and Examination
Import Data: Import the
ey_swift.csvfile into your R environment using theread_csv()function. Name the dataframeswift.swift <- read_csv("http://bus320-quarto.netlify.app/data/ey_swift.csv")Delete product “CASH COUNTDOWN 475” and assign output to
df_cleandf_clean <- swift |> filter(product_name != "CASH COUNTDOWN 475")
Import Data: Import the
ey_states.csvfile into your R environment using theread_csv()function. Name the resultant dataframedf_states.Examine
df_states:- Use the
glimpse()function to review its structure. - Examine column-level statistics with the
skim()function. selectandarrange()thepostal_codecolumn in both ascending and descending orders
- Use the
Examine
df_clean:- Use the
glimpse()function to review its structure. - Examine column-level statistics with the
skim()function. selectandarrange()thezipcolumn in ascending and descending order.
- Use the
Question Response: Write a few sentences detailing any potential challenges you might face when merging these two dataframes.
b. Preparing ZIP Column in df_clean for Joining
Use
mutate()andas.character()to convertzipindf_cleanto character type, saving aszipsave output todf_clean.Use
select()andskim()to examinezipindf_clean.
c. Preparing ZIP Column in df_states for Joining
Modify the
postal_codecolumn indf_states. Usemutate()andstr_sub()to remove “+four codes” and ‘-’ in ensuringpostal_coderetains only the initial five characters. Call the new variablezip.Use
select()andskim()to examinezipindf_states.
d. Verify Formatting
- Use
select()andarrangein descending order to confirm the removal of redundant characters from thezipcolumn indf_states.
e. Select Relevant Columns in df_states
Use
select()to keep onlyzipandstate_provincewithzippositioned first.Show
df_states
f. Clean and Export the Dataframes
Write the modified dataframes
df_statesanddf_cleanto .csv files, saving them in the “posts” directory.- First install the package
here write_csv(df_states, here::here("posts", "df_states.csv"))write_csv(df_clean, here::here("posts", "df_clean.csv"))
- First install the package
Note: After completing these steps, you’ll be ready to join these datasets by zip.