This may be more apparent in the world of scientific experimentation. If you're testing a new cholesterol drug, for example, the categories you set up in advance might look at patients by age, gender and whether they're given the drug or a placebo. The measurements (or calculations resulting from those measurements) are your results: Changes in overall cholesterol level, LDL and HDL, for example. But whatever your data, you should have at least one category and one measurement if you want to create a long data frame.
In the example data we've been using here, my categories are fy and company, while my measurements are revenue, profit and margin.
And now here's the next concept you need to understand about reshaping from wide to long: Because you want only one measurement in each row, you'll need to add a column that says which type of measurement each value is if you have more than one category.
In my existing wide format, the column headers tell me the measurement type: revenue, profit or margin. But since I'm rearranging this to only have one of those numbers in each row, not three, I'll add a column to show which measurement it is.
I think an example will make this a lot clearer. Here's one "wide" row:
fy | company | revenue | profit | margin |
---|---|---|---|---|
2010 | Apple | 65225 | 14013 | 21.48409 |
And here's how to have only one measurement per row -- by creating three "long" rows:
fy | company | financialCategory | value |
---|---|---|---|
2010 | Apple | revenue | 65225 |
2010 | Apple | profit | 14013 |
2010 | Apple | margin | 21.5 |
The column financialCategory now tells me what type of measurement each value is. And now, the term "value" should make more sense.
At last we're ready for some code to reshape a data frame from wide to long! As with pretty much everything in R, there are multiple ways to perform this task. In the tidyverse, you'll want to install the package tidyr.
As of early 2019, tidyr introduced two new functions: pivot_longer()
and pivot_wider()
. They're somewhat more intuitive to use than the package's earlier reshaping functions, gather() and spread() (which still work, but will no longer be recommended or maintained).
To use the new functions, install the development version of tidyr from GitHub version with remotes::install_github("tidyverse/tidyr")
or devtools::install_github("tidyverse/tidyr")
, and load it with library(tidyr)
. That should overwrite any older tidyr version which may be on your system. .
Use tidyr's pivot_longer() function to go from wide to long(er). pivot_longer() uses the following format to assign results to a variable named longData:
longData <- melt(df = your original data frame, cols = c(vector of names of columns you want to move into a single column), names_to = "new_category_column_name", values_to = "new_value_column_name")
Note that the names of the columns you're moving from separate columns into a single column do not need to be in quotation marks, but the names of your new category and value columns do.
Using tidyr, wide-to-long code can simply be:
companiesLong <- pivot_longer(companiesData, cols = c(revenue, profit, margin), names_to = "variable", values_to = "value")
This produces:
fy | company | variable | value | |
---|---|---|---|---|
1 | 2010 | Apple | revenue | 65225.0 |
2 | 2011 | Apple | revenue | 108249.0 |
3 | 2012 | Apple | revenue | 156508.0 |
4 | 2010 | revenue | 29321.0 | |
5 | 2011 | revenue | 37905.0 | |
6 | 2012 | revenue | 50175.0 | |
7 | 2010 | Microsoft | revenue | 62484.0 |
8 | 2011 | Microsoft | revenue | 69943.0 |
9 | 2012 | Microsoft | revenue | 73723.0 |
10 | 2010 | Apple | profit | 14013.0 |
11 | 2011 | Apple | profit | 25922.0 |
12 | 2012 | Apple | profit | 41733.0 |
13 | 2010 | profit | 8505.0 | |
14 | 2011 | profit | 9737.0 | |
15 | 2012 | profit | 10737.0 | |
16 | 2010 | Microsoft | profit | 18760.0 |
17 | 2011 | Microsoft | profit | 23150.0 |
18 | 2012 | Microsoft | profit | 16978.0 |
19 | 2010 | Apple | margin | 21.5 |
20 | 2011 | Apple | margin | 23.9 |
21 | 2012 | Apple | margin | 26.7 |
22 | 2010 | margin | 29.0 | |
23 | 2011 | margin | 25.7 | |
24 | 2012 | margin | 21.4 | |
25 | 2010 | Microsoft | margin | 30.0 |
26 | 2011 | Microsoft | margin | 33.1 |
27 | 2012 | Microsoft | margin | 23.0 |