After a while of using the pivot_wider and pivot_longer tidyverse functions, I still can never remember how they work. The argument names never seem to stick.
I’ve found it’s easier to think about in terms of grouping. Pivoting is just an operation that can take groups of data and sort them into columns or combine them into rows. The key to understanding pivoting is to visualize the groups that are being rearranged.
For example, take the tidyr::fish_encounters
dataset. It is important to recognize that this dataset only contains a single column of actual data, which is the seen
column. The fish
and station
columns can be thought of as grouping variables for seen
.
Imagine annotating each value of seen
with the corresponding value of station
and fish
, so the fish
and station
columns are embedded in a single column. Then you can move around these seen
values however you want with no loss of information. Each annotation (fish
or station
) can define either a group of rows or a set of columns.
To pivot wider, you make groups by fish
and station
and use fish
to group rows, while station
sorts the data into columns.
The corresponding R code for this is below:
library(tidyr)
wide_fishes <- pivot_wider(fish_encounters, id_cols = fish, names_from = "station", values_from = seen)
wide_fishes
## # A tibble: 19 × 12
## fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
## <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 4842 1 1 1 1 1 1 1 1 1 1 1
## 2 4843 1 1 1 1 1 1 1 1 1 1 1
## 3 4844 1 1 1 1 1 1 1 1 1 1 1
## 4 4845 1 1 1 1 1 NA NA NA NA NA NA
## 5 4847 1 1 1 NA NA NA NA NA NA NA NA
## 6 4848 1 1 1 1 NA NA NA NA NA NA NA
## 7 4849 1 1 NA NA NA NA NA NA NA NA NA
## 8 4850 1 1 NA 1 1 1 1 NA NA NA NA
## 9 4851 1 1 NA NA NA NA NA NA NA NA NA
## 10 4854 1 1 NA NA NA NA NA NA NA NA NA
## 11 4855 1 1 1 1 1 NA NA NA NA NA NA
## 12 4857 1 1 1 1 1 1 1 1 1 NA NA
## 13 4858 1 1 1 1 1 1 1 1 1 1 1
## 14 4859 1 1 1 1 1 NA NA NA NA NA NA
## 15 4861 1 1 1 1 1 1 1 1 1 1 1
## 16 4862 1 1 1 1 1 1 1 1 1 NA NA
## 17 4863 1 1 NA NA NA NA NA NA NA NA NA
## 18 4864 1 1 NA NA NA NA NA NA NA NA NA
## 19 4865 1 1 1 NA NA NA NA NA NA NA NA
So, to translate, values_from
selects the data, id_cols
defines the grouping you want to keep as row groups, and names_from
is the grouping you want to translate to columns. Every column should fall in one of these categories.
Similarly, we can pivot back to a longer format by doing the reverse. We have a grouping variable fish
and a data column for each station
. We make groups of data using fish
and station
, then stack all the data back into a single column.
pivot_longer(wide_fishes, cols = -fish, names_to = "station", values_to = "seen")
## # A tibble: 209 × 3
## fish station seen
## <fct> <chr> <int>
## 1 4842 Release 1
## 2 4842 I80_1 1
## 3 4842 Lisbon 1
## 4 4842 Rstr 1
## 5 4842 Base_TD 1
## 6 4842 BCE 1
## 7 4842 BCW 1
## 8 4842 BCE2 1
## 9 4842 BCW2 1
## 10 4842 MAE 1
## # … with 199 more rows
The pivot_longer
function treats all selected columns as data columns, and all non-selected columns as grouping variables. It then combines all the data columns into a single column. We select our data with cols
. values_to
names the new combined data column, and names_to
names the new grouping variable.