From wide to long
The original data
Let’s suppose that you captured data in the following format:
- one line per respondent
- one column per repeated answer (recorded here as ans_1, ans_2 and ans_3)
- additional columns to describe the respondents
## # A tibble: 5 x 6
## resp_id age city ans_1 ans_2 ans_3
## <int> <int> <chr> <int> <int> <int>
## 1 1 43 Cape Town 2 1 1
## 2 2 48 Pretoria NA 2 2
## 3 3 45 Cape Town 3 3 2
## 4 4 40 Pretoria 2 1 2
## 5 5 31 Cape Town 2 2 1
What do you want to obtain?
If you want to reshape your table as a long format, you will shape in order to obain
- one line per answer, per respondent
- one column indicating the actual answer
- additional columns to describe the respondents
Use pivot_longer()
To do so, we will use the command pivot_longer()
that is part of the tidyr
package (a member of the tidyverse family). So do not forget to load the tidyverse
or the tidyr
package:
library(tidyverse)
The easiest way to pivot the table is to identify the columns that will go from columns to rows. In our case, we want the columns ans_1, ans_2, ans_3 to be appearing as rows. We will provide the vector of columns that will be pivoted:
pivot_longer(dt, cols=c(ans_1, ans_2, ans_3))
## # A tibble: 15 x 5
## resp_id age city name value
## <int> <int> <chr> <chr> <int>
## 1 1 43 Cape Town ans_1 2
## 2 1 43 Cape Town ans_2 1
## 3 1 43 Cape Town ans_3 1
## 4 2 48 Pretoria ans_1 NA
## 5 2 48 Pretoria ans_2 2
## 6 2 48 Pretoria ans_3 2
## 7 3 45 Cape Town ans_1 3
## 8 3 45 Cape Town ans_2 3
## 9 3 45 Cape Town ans_3 2
## 10 4 40 Pretoria ans_1 2
## 11 4 40 Pretoria ans_2 1
## 12 4 40 Pretoria ans_3 2
## 13 5 31 Cape Town ans_1 2
## 14 5 31 Cape Town ans_2 2
## 15 5 31 Cape Town ans_3 1
In the new table:
the three columns ans_1 to ans_3 have disappeared
two new columns name and value have been added
- name corresponds to string identifying the former column name
- value corresponds to the value of the cell in the former column
Note that since the question have an easy pattern for names, we could obtain the same result with less effort:
pivot_longer(dt, cols=contains("ans"))
## # A tibble: 15 x 5
## resp_id age city name value
## <int> <int> <chr> <chr> <int>
## 1 1 43 Cape Town ans_1 2
## 2 1 43 Cape Town ans_2 1
## 3 1 43 Cape Town ans_3 1
## 4 2 48 Pretoria ans_1 NA
## 5 2 48 Pretoria ans_2 2
## 6 2 48 Pretoria ans_3 2
## 7 3 45 Cape Town ans_1 3
## 8 3 45 Cape Town ans_2 3
## 9 3 45 Cape Town ans_3 2
## 10 4 40 Pretoria ans_1 2
## 11 4 40 Pretoria ans_2 1
## 12 4 40 Pretoria ans_3 2
## 13 5 31 Cape Town ans_1 2
## 14 5 31 Cape Town ans_2 2
## 15 5 31 Cape Town ans_3 1
The column name and value are not very explicit, so you can rename them with the arguments names_to
, and values_to
.
pivot_longer(dt, cols=contains("ans"),
names_to = "Question", values_to = "Answer")
## # A tibble: 15 x 5
## resp_id age city Question Answer
## <int> <int> <chr> <chr> <int>
## 1 1 43 Cape Town ans_1 2
## 2 1 43 Cape Town ans_2 1
## 3 1 43 Cape Town ans_3 1
## 4 2 48 Pretoria ans_1 NA
## 5 2 48 Pretoria ans_2 2
## 6 2 48 Pretoria ans_3 2
## 7 3 45 Cape Town ans_1 3
## 8 3 45 Cape Town ans_2 3
## 9 3 45 Cape Town ans_3 2
## 10 4 40 Pretoria ans_1 2
## 11 4 40 Pretoria ans_2 1
## 12 4 40 Pretoria ans_3 2
## 13 5 31 Cape Town ans_1 2
## 14 5 31 Cape Town ans_2 2
## 15 5 31 Cape Town ans_3 1
Finally, since the column names had the pattern ‘ans_’, it may appear strange to have ans_1, ans_2, etc. as possible values for questions ! You can use the option names_prefix to eliminate it when it is pivoted
pivot_longer(dt, cols=contains("ans"),
names_prefix = "ans_", names_to = "Question", values_to = "Answer")
## # A tibble: 15 x 5
## resp_id age city Question Answer
## <int> <int> <chr> <chr> <int>
## 1 1 43 Cape Town 1 2
## 2 1 43 Cape Town 2 1
## 3 1 43 Cape Town 3 1
## 4 2 48 Pretoria 1 NA
## 5 2 48 Pretoria 2 2
## 6 2 48 Pretoria 3 2
## 7 3 45 Cape Town 1 3
## 8 3 45 Cape Town 2 3
## 9 3 45 Cape Town 3 2
## 10 4 40 Pretoria 1 2
## 11 4 40 Pretoria 2 1
## 12 4 40 Pretoria 3 2
## 13 5 31 Cape Town 1 2
## 14 5 31 Cape Town 2 2
## 15 5 31 Cape Town 3 1
Full details about pivot_long can be found at: pivot_longer