Introduction
In most real-life cases, data will not be ready for immediate use. It will have missing values, missing variable names, or variables scattered into multiple columns that you will need to synthesize. Therefore, you need to be able to manipulate these data.
In the section “Getting ready”, you have manipulated vectors, matrices and data.frames by reordering them and subsetting them through indexing and the operator [ ]
. But once you start more advanced analyses, you will want to manipulate your data with more efficient tools.
To do so, you have different options. Among them, 2 environments are popular:
- The data.table package
- The tidyverse ecosystem
In this section, you will learn how to work with the packages of the ecosystem tidyverse
Learning objective
After the first chapter, you will be able to perform basic operations on your raw data. These include selecting specific rows and columns, sorting rows and creating new columns. You will also learn how to chain these different commands together.
After the second chapter, you will be able to identify wide and long formats and be able to reshape your data from one format to another.
After the third chapter, you will be able to merge information scattered into different but related tables.
The data set
In this chapter, you will work with the datasets available in the gapminder
package.
You should now be able to install the package yourself. (If not, please refer to the section R packages )
Load the gapminder
package:
library(gapminder)
If everything went well, you should be able to access the gapminder data directly:
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows
You are now ready to work!