Analytics mindset case studies
ETL in R, part 3
Deliverable 6: Preparing to join Swift and States Data Sets
Objective: Prepare to join Swift
data with states
data to include a state column based on ZIP codes.
a. Data Loading and Examination
Import Data: Import the
ey_swift.csv
file into your R environment using theread_csv()
function. Name the dataframeswift
.swift <- read_csv("http://bus320-quarto.netlify.app/data/ey_swift.csv")
Delete product “CASH COUNTDOWN 475” and assign output to
df_clean
df_clean <- swift |> filter(product_name != "CASH COUNTDOWN 475")
Import Data: Import the
ey_states.csv
file into your R environment using theread_csv()
function. Name the resultant dataframedf_states
.Examine
df_states
:- Use the
glimpse()
function to review its structure. - Examine column-level statistics with the
skim()
function. select
andarrange()
thepostal_code
column in both ascending and descending orders
- Use the
Examine
df_clean
:- Use the
glimpse()
function to review its structure. - Examine column-level statistics with the
skim()
function. select
andarrange()
thezip
column in ascending and descending order.
- Use the
Question Response: Write a few sentences detailing any potential challenges you might face when merging these two dataframes.
b. Preparing ZIP Column in df_clean
for Joining
Use
mutate()
andas.character()
to convertzip
indf_clean
to character type, saving aszip
save output todf_clean
.Use
select()
andskim()
to examinezip
indf_clean
.
c. Preparing ZIP Column in df_states
for Joining
Modify the
postal_code
column indf_states
. Usemutate()
andstr_sub()
to remove “+four codes” and ‘-’ in ensuringpostal_code
retains only the initial five characters. Call the new variablezip
.Use
select()
andskim()
to examinezip
indf_states
.
d. Verify Formatting
- Use
select()
andarrange
in descending order to confirm the removal of redundant characters from thezip
column indf_states
.
e. Select Relevant Columns in df_states
Use
select()
to keep onlyzip
andstate_province
withzip
positioned first.Show
df_states
f. Clean and Export the Dataframes
Write the modified dataframes
df_states
anddf_clean
to .csv files, saving them in the “posts” directory.- First install the package
here
write_csv(df_states, here::here("posts", "df_states.csv"))
write_csv(df_clean, here::here("posts", "df_clean.csv"))
- First install the package
Note: After completing these steps, you’ll be ready to join these datasets by zip
.