Week 7
Cleaning and Data
Analysis in II
Soci—269
Coding Assignment Deadline
Your first coding assignment is now due by 8:00 PM on Tuesday, November 4th.
Launch RStudio and execute the following code:
Environment tab.Here are a list of objects available in your Global Environment:
vdem <- vdem |> as_tibble()
high_level <- c("v2x_polyarchy", "v2x_libdem", "v2x_partipdem", "v2x_delibdem", "v2x_egaldem")
high_level_names <- c("electoral", "liberal", "participatory", "deliberative", "egalitarian")
can_mex_usa <- vdem |> as_tibble() |>
# Here, we (i) select (and rename) the first column;
# (ii) select the year variable; then
# (iii) select all the variables in the high_level vector.
select(country = 1, year, all_of(high_level)) |>
# Here, we rename all the high_level variables using the
# high_level_names vector.
rename_with(~ high_level_names, all_of(high_level)) |>
# We're performing "per-operation" grouping here---within
# the mutate() function (by year). We're then relocating the
# grouped variable we created and ensuring that it appears after
# "electoral."
mutate(electoral_global_avg = mean(electoral, na.rm = TRUE),
.by = year, .after = electoral) |>
# Isolating Canada, the US and Mexico + years in the
# 21st century:
filter(str_detect(country, "Can|United St|Mex"),
year >= 2000) |>
# Arranging countries alphabetically:
arrange(country)
can_mex_usa# A tibble: 75 × 8
country year electoral electoral_global_avg liberal participatory
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Canada 2000 0.843 0.491 0.766 0.594
2 Canada 2001 0.838 0.494 0.76 0.587
3 Canada 2002 0.831 0.505 0.753 0.578
4 Canada 2003 0.831 0.513 0.753 0.574
5 Canada 2004 0.83 0.513 0.758 0.571
6 Canada 2005 0.829 0.518 0.756 0.571
7 Canada 2006 0.834 0.521 0.765 0.571
8 Canada 2007 0.834 0.517 0.766 0.572
9 Canada 2008 0.832 0.521 0.762 0.57
10 Canada 2009 0.833 0.523 0.764 0.57
# ℹ 65 more rows
# ℹ 2 more variables: deliberative <dbl>, egalitarian <dbl>
dplyr—dplyr::bind_rows()We can use bind_rows() to combine observations—or rows—from different data frames.
Quick Exercise
In the next 5-10 minutes, try to recreate can_mex_usa using …
Environment.bind_rows().Note: If you’re running into issues, fear not—the answer’s on the next slide.
dplyr::bind_rows()# A tibble: 75 × 8
country year electoral electoral_global_avg liberal participatory
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Canada 2000 0.843 0.491 0.766 0.594
2 Canada 2001 0.838 0.494 0.76 0.587
3 Canada 2002 0.831 0.505 0.753 0.578
4 Canada 2003 0.831 0.513 0.753 0.574
5 Canada 2004 0.83 0.513 0.758 0.571
6 Canada 2005 0.829 0.518 0.756 0.571
7 Canada 2006 0.834 0.521 0.765 0.571
8 Canada 2007 0.834 0.517 0.766 0.572
9 Canada 2008 0.832 0.521 0.762 0.57
10 Canada 2009 0.833 0.523 0.764 0.57
# ℹ 65 more rows
# ℹ 2 more variables: deliberative <dbl>, egalitarian <dbl>
dplyr::bind_cols()Another Mini-Exercise
Okay, that wasn’t so hard. Let’s try to use bind_cols(), to append a new variable—e_regionpol from truncated_regions—to our data.
dplyr::bind_cols()# The Solution:
can_mex_usa_1 <- can_mex_usa |> bind_cols(# Avoiding duplicate columns:
truncated_regions |>
select(-c(1:2))) |>
# Relocating e_regionpol variable
relocate(e_regionpol, .after = country)
can_mex_usa_1# A tibble: 75 × 9
country e_regionpol year electoral electoral_global_avg liberal
<chr> <hvn_lbll> <dbl> <dbl> <dbl> <dbl>
1 Canada 2 2000 0.843 0.491 0.766
2 Canada 2 2001 0.838 0.494 0.76
3 Canada 2 2002 0.831 0.505 0.753
4 Canada 2 2003 0.831 0.513 0.753
5 Canada 2 2004 0.83 0.513 0.758
6 Canada 2 2005 0.829 0.518 0.756
7 Canada 2 2006 0.834 0.521 0.765
8 Canada 2 2007 0.834 0.517 0.766
9 Canada 2 2008 0.832 0.521 0.762
10 Canada 2 2009 0.833 0.523 0.764
# ℹ 65 more rows
# ℹ 3 more variables: participatory <dbl>, deliberative <dbl>,
# egalitarian <dbl>
dplyr::left_join()What happens when we try to bind can_mex_usa with all_regions using the bind_cols() function?
The powerful *_join() family of verbs from dplyr is especially useful when we’re stitching together data frames of different dimensions
(e.g., different numbers of rows).
dplyr::left_join()Yet Another Mini-Exercise
Try to use left_join() to attach the e_regionpol variable from all_regions to our original can_mex_usa data frame.
Store your new object as can_mex_usa_2 and relocate e_regionpol so that it appears right after country.
dplyr::left_join()# The Solution:
can_mex_usa_2 <- can_mex_usa |> left_join(all_regions) |>
relocate(e_regionpol, .after = country)
can_mex_usa_2# A tibble: 75 × 9
country e_regionpol year electoral electoral_global_avg liberal
<chr> <hvn_lbll> <dbl> <dbl> <dbl> <dbl>
1 Canada 5 2000 0.843 0.491 0.766
2 Canada 5 2001 0.838 0.494 0.76
3 Canada 5 2002 0.831 0.505 0.753
4 Canada 5 2003 0.831 0.513 0.753
5 Canada 5 2004 0.83 0.513 0.758
6 Canada 5 2005 0.829 0.518 0.756
7 Canada 5 2006 0.834 0.521 0.765
8 Canada 5 2007 0.834 0.517 0.766
9 Canada 5 2008 0.832 0.521 0.762
10 Canada 5 2009 0.833 0.523 0.764
# ℹ 65 more rows
# ℹ 3 more variables: participatory <dbl>, deliberative <dbl>,
# egalitarian <dbl>
dplyr—dplyr::*ifWe can extend filter() so that rows are extracted based on the values of all their columns …
… or any of their columns:
Note: Keep clicking or the space bar on your to advance through the slide deck.
dplyr::rename_withBy using rename_with(), we can rename multiple columns at once.
rename_with() accepts a variety of functions and character vectors.
Note: Keep clicking or the space bar on your to advance through the slide deck.
dplyr::*across()We can transform the values in more than one column by deploying across().
We can use embed functions within across(), too.
Moreover, we can use across() to generate quick descriptive summaries.
Note: Keep clicking or the space bar on your to advance through the slide deck.
dplyr—We can modify the orientation of our data using pivot_longer() …
… and pivot_wider().
Note: Keep clicking or the space bar on your to advance through the slide deck.
Try to produce the following data frame:
# A tibble: 375 × 5
country region year measure score
<chr> <fct> <dbl> <chr> <dbl>
1 Canada North America and Western Europe 2000 electoral 84.3
2 Canada North America and Western Europe 2000 liberal 76.6
3 Canada North America and Western Europe 2000 participatory 59.4
4 Canada North America and Western Europe 2000 deliberative 75.7
5 Canada North America and Western Europe 2000 egalitarian 71.2
6 Canada North America and Western Europe 2001 electoral 83.8
7 Canada North America and Western Europe 2001 liberal 76
8 Canada North America and Western Europe 2001 participatory 58.7
9 Canada North America and Western Europe 2001 deliberative 75.2
10 Canada North America and Western Europe 2001 egalitarian 70.5
# ℹ 365 more rows
The can_mex_usa_long data frame should be
available in your Environment.
Wickham, Çetinkaya-Rundel, and Grolemund (2023)
