In the last tutorial, we saw how to reshape data from long to wide format in R. Today, we will see how to convert a data format from wide to long. Most plotting functions, including those from ggplot2, require data in a long format. Therefore, we will use the reshape and stack functions from base R, the gather and pivot_longer functions from the tidyr package and the melt function from the data.table package.
Some terminology:
- id column: it is still present in the long format
- key column: it is a new variable which replaces several others
- value column: it is a new variable with the content of those several other variables
Load the tidyr and the data.table packages
library("tidyr")
library("data.table")
Load the iris dataset
data("iris")
Example with one id column
The image below shows a theoretical example of a conversion from wide to long format:
a) Variables assignment
id_column = "Petal.Width"
key_column = "Species"
value_column = "Freq"
data_long = as.data.frame(table(iris[, c(id_column, key_column)]),
stringsAsFactors = FALSE)
data_wide = reshape(data_long, direction = "wide", idvar = id_column,
timevar = key_column)
names(data_wide) = sub(paste0(value_column, "."), "", names(data_wide))
b) Functions
1) reshape from base R
data_long = reshape(data_wide, direction = "long", idvar = id_column,
timevar = key_column, v.names = value_column,
times = setdiff(names(data_wide), id_column),
varying = list(setdiff(names(data_wide), id_column)),
new.row.names = seq(1:(nrow(data_wide)*(ncol(data_wide)-1))))
2) stack from base R
data_long = cbind(id_column = data_wide[, id_column],
stack(data_wide[, !names(data_wide) %in% id_column]))
names(data_long) = sub("ind", key_column, names(data_long))
names(data_long) = sub("values", value_column, names(data_long))
3) gather from tidyr
data_long = gather(data_wide, key = key_column, value = value_column,
-all_of(id_column))
names(data_long) = sub("key_column", key_column, names(data_long))
names(data_long) = sub("value_column", value_column, names(data_long))
4) pivot_longer from tidyr
data_long = pivot_longer(data_wide, names_to = key_column,
values_to = value_column, -all_of(id_column))
5) melt from data.table
data_long = melt(as.data.table(data_wide), id.vars = id_column,
variable.name = key_column, value.name = value_column)
c) Result
Remark
The output data structure can be different depending on the function used:
- the two functions from base R give a data.frame output
- the two functions from the tidyr package give a tibble output
- the function from the data.table package give a data.table output
Rows and/or columns order may also differ, but the content will be the same.
Conclusion
In conclusion, we can change data format from wide to long using the reshape and stack functions from base R, the gather and pivot_longer functions from the tidyr package and the melt function from the data.table package. But in my opinion, the easiest ones are the pivot_longer and melt functions. Which function do you use the most?