In this blog post, we will see how to reshape data from long to wide format in R. I have already used a custom function for that in the article about 3D histograms. But in this tutorial, we will use the reshape and unstack functions from base R, the spread and pivot_wider functions from the tidyr package and the dcast function from the data.table package.
Some terminology:
- id column: it is still present in the wide format
- key column: its levels become new variable columns
- value column: its content fill in the new columns
Load the tidyr and the data.table packages
Load the iris dataset
Example with one id column
The image below shows a theoretical example of data formatting from long to wide:
a) Variables assignment
id_column = "Petal.Width"
key_column = "Species"
value_column = "Freq"
data_long =[, c(id_column, key_column)]),
stringsAsFactors = FALSE)
b) Functions
1) reshape from base R
data_wide = reshape(data_long, direction = "wide", idvar = id_column,
timevar = key_column)
names(data_wide) = sub(paste0(value_column, "."), "", names(data_wide))
2) unstack from base R
data_wide = cbind(id_column = unique(data_long[, id_column]),
unstack(data_long, get(value_column) ~ get(key_column)))
names(data_wide) = sub("id_column", id_column, names(data_wide))
3) spread from tidyr
data_wide = spread(data_long, key = all_of(key_column),
value = all_of(value_column))
4) pivot_wider from tidyr
data_wide = pivot_wider(data_long, names_from = all_of(key_column),
values_from = all_of(value_column))
5) dcast from data.table
data_wide = dcast(, get(id_column) ~ get(key_column),
value.var = value_column)
names(data_wide) = sub("id_column", id_column, names(data_wide))
c) Result
Rows order may be different depending on the function used, but the content will be identical. The output data structure can also vary:
- the two functions from base R give a data.frame output
- the two functions from the tidyr package give a tibble output
- the function from the data.table package give a data.table output
To sum up, we can convert data from long to wide format using the reshape and unstack functions from base R, the spread and pivot_wider functions from the tidyr package and the dcast function from the data.table package. But the easiest ones are the functions from the tidyr packages. Which function do you prefer?