From long to wide format in R

In this blog post, we will see how to reshape data from long to wide format in R. I have already used a custom function for that in the article about 3D histograms. But in this tutorial, we will use the reshape and unstack functions from base R, the spread and pivot_wider functions from the tidyr package and the dcast function from the data.table package.

Some terminology:

  • id column: it is still present in the wide format
  • key column: its levels become new variable columns
  • value column: its content fill in the new columns

Load the tidyr and the data.table packages

library("tidyr")
library("data.table")

Load the iris dataset

data("iris")

Example with one id column

The image below shows a theoretical example of data formatting from long to wide:

from long to wide format in R

a) Variables assignment

id_column = "Petal.Width"
key_column = "Species"
value_column = "Freq"

data_long = as.data.frame(table(iris[, c(id_column, key_column)]),
                          stringsAsFactors = FALSE)
observations for each combination of two variables from the iris dataset in a long format

b) Functions

1) reshape from base R

data_wide = reshape(data_long, direction = "wide", idvar = id_column,
                    timevar = key_column)
names(data_wide) = sub(paste0(value_column, "."), "", names(data_wide))

2) unstack from base R

data_wide = cbind(id_column = unique(data_long[, id_column]),
                  unstack(data_long, get(value_column) ~ get(key_column)))
names(data_wide) = sub("id_column", id_column, names(data_wide))

3) spread from tidyr

data_wide = spread(data_long, key = all_of(key_column),
                   value = all_of(value_column))

4) pivot_wider from tidyr

data_wide = pivot_wider(data_long, names_from = all_of(key_column),
                        values_from = all_of(value_column))

5) dcast from data.table

data_wide = dcast(as.data.table(data_long), get(id_column) ~ get(key_column),
                  value.var = value_column)
names(data_wide) = sub("id_column", id_column, names(data_wide))

c) Result

observations for each combination of two variables from the iris dataset in a wide format

Remark

Rows order may be different depending on the function used, but the content will be identical. The output data structure can also vary:

  • the two functions from base R give a data.frame output
  • the two functions from the tidyr package give a tibble output
  • the function from the data.table package give a data.table output

Conclusion

To sum up, we can convert data from long to wide format using the reshape and unstack functions from base R, the spread and pivot_wider functions from the tidyr package and the dcast function from the data.table package. But the easiest ones are the functions from the tidyr packages. Which function do you prefer?

Related posts

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply