R Data Frames

Learning objectives

data.frame objects allow to store data of different basic types into a single tabular structure. I will use the “Data frame” spelling.

Data frames are very handy to work with survey data since they are usually stored in tabular form, with many columns corresponding to the responses to the survey and many rows (usually, but not always, one row corresponding to one survey).

In this section, you will learn:

What is a data frame
How to create a data.frame object
How to access the data
How to sort the data
How to quickly summarize the data
Other classes of data frames
- Tibbles
- data.table

What is a data frame?

A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.

Following are the characteristics of a data frame.

The column names should be non-empty;
The row names should be unique;
The data stored in a data frame can be of numeric, factor or character type;
Each column should contain same number of data items.

Data frames are particularly useful because we can combine different types into one single object and are easier to handle than lists.

We can use built-in dataframes in R to understand this. For example, here is a built-in data frame in R, called mtcars. You can access it by just simplying typing its name.

mtcars

##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

The top line of the table, called the header, contains the column names. Each horizontal line afterward denotes a data row, which begins with the name of the row, and then followed by the actual data. Each data member of a row is called a cell.

You can easily interpret this as a data frame describing cars. Each row corresponds to a unique model of car, each column corresponds to the different characteristics of the cars.

If you want to know the exact structure of the object mtcars, you can use the function str()

str(mtcars)

## 'data.frame':	32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

We are learning that mtcars is an R object of type data.frame, that contains 32 observations (i.e. 32 rows) of 11 variables (i.e., 11 columns). Then each column is described (name, type, first values stored).

Create a Data Frame

Typically, data.frames are imported from other programs rather than created from scratch (refer to the Import Data section for details). Nevertheless, it is important to be aware that you can generate a data.frame on your own.

You create a data frame by supplying name-vector pairs to the function data.frame():

df1 <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3, 515.2, 611.0, 729.0, 843.25), 
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2021-11-15", "2014-05-11", "2015-03-27"))
)

str(df1)

## 'data.frame':	5 obs. of  4 variables:
##  $ emp_id    : int  1 2 3 4 5
##  $ emp_name  : chr  "Rick" "Dan" "Michelle" "Ryan" ...
##  $ salary    : num  623 515 611 729 843
##  $ start_date: Date, format: "2012-01-01" "2013-09-23" ...

How to retrieve the data?

Data in a cell

To retrieve data in a cell, you will enter its row and column coordinates in the single square bracket [ ] operator. The two coordinates are separated by a comma. In other words, the coordinates begins with row position, then followed by a comma, and ends with the column position. The order is important.

Here is the cell value from the first row, second column of mtcars.

mtcars[1, 2]

## [1] 6

Moreover, we can use the row and column names instead of the numeric coordinates.

mtcars["Mazda RX4", "cyl"]

## [1] 6

A note on row and column names

If you do not remember the names of the columns you can use either names() or colnames():

names(mtcars)

##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"

# colnames(mtcars)  #colnames give the same results

If you do not remember the names of the rows you can use rownames():

rownames(mtcars)

##  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
##  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
##  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
## [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
## [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
## [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
## [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
## [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
## [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
## [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
## [31] "Maserati Bora"       "Volvo 142E"

Data contained in a column

There are several ways to retrieve the data contained into the columns.

First, we can retrieve a single column with the “$” operator:

mtcars$hp

##  [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52
## [20]  65  97 150 150 245 175  66  91 113 264 175 335 109

class(mtcars$hp)

## [1] "numeric"

Note that you need to know the name of the column you want to retrieve. However, if using R Studio, if you once you have typed mtcars$ a list of possible names will be suggested to you. (yet another good reason to use R Studio instead of R)

Second, we can retrieve the same column with the single square bracket “[]” operator. To do this, we have to prepend the column name (or column number) with a comma character, which signals that we want to take consider all the rows:

mtcars[,"hp"]

##  [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52
## [20]  65  97 150 150 245 175  66  91 113 264 175 335 109

Alternatively, we can use the column number.

mtcars[,4]

##  [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52
## [20]  65  97 150 150 245 175  66  91 113 264 175 335 109

However, it is often clearer to use the column name than the column number. Besides, if you change the structure of the data frame, the ordering of the columns may change, and you may unknowingly refer to a different number if you use column numbers.

In both cases:

the object mtcars$hp or mtcars[,4] is a vector containing 32 numbers (i.e., the number of rows).
the order of the entries in the mtcars$hp vector preserves the order of the rows in our data frame. This is important as this allows us to manipulate one variable based on the results of another.

Data contained in several columns

This use of brackets is becoming especially useful when you want to retrieve several columns. Try this command:

carsub <- mtcars[,2:4]
head(carsub)  # the function head() displays the first six rows of the table

##                   cyl disp  hp
## Mazda RX4           6  160 110
## Mazda RX4 Wag       6  160 110
## Datsun 710          4  108  93
## Hornet 4 Drive      6  258 110
## Hornet Sportabout   8  360 175
## Valiant             6  225 105

This example shows that you can extract the columns 2, 3 and 4 of the table mtcars and store them in a new data.frame called carsub.

If you want to show several columns that are not in the same sequence of numbers, you can use the function c().

For example, if you want to select the columns 3,7 and 11:

carsub <- mtcars[,c(3,7,11)]
head(carsub)

##                   disp  qsec carb
## Mazda RX4          160 16.46    4
## Mazda RX4 Wag      160 17.02    4
## Datsun 710         108 18.61    1
## Hornet 4 Drive     258 19.44    1
## Hornet Sportabout  360 17.02    2
## Valiant            225 20.22    1

Again, you can use a vector of column names instead of column numbers:

carsub <- mtcars[ ,c("disp", "qsec","carb")]

Data contained in the rows

We retrieve rows from a data frame with the single square bracket operator, just like what we did with columns. However, in additional to an index vector of row positions, we append an extra comma character. This is important, as the extra comma signals a wildcard match for the second coordinate for column positions.

To extract rows from a data frame, we use the single square bracket operator, just as we did with columns. However, instead of just providing a vector of row positions, we add a comma after it.

mtcars[1:2, ]

##               mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

This comma is crucial because it indicates a wildcard match for the second coordinate for column positions. In other words, the comma tells R to include all columns when sub-setting the data frame. Without the comma, R would interpret the index vector as selecting columns, which is not what you wanted. To convince yourself, evaluate the following command:

mtcars[1:2]

You can also select rows that are not part of a sequence:

mtcars[c(1,3,10), ]

##             mpg cyl  disp  hp drat   wt  qsec vs am gear carb
## Mazda RX4  21.0   6 160.0 110 3.90 2.62 16.46  0  1    4    4
## Datsun 710 22.8   4 108.0  93 3.85 2.32 18.61  1  1    4    1
## Merc 280   19.2   6 167.6 123 3.92 3.44 18.30  1  0    4    4

Once you understand this syntax and what was said in the previous section about selection of columns, you should be able to select different row selections using the c() function and either row number or row names.

Sub-set of rows and columns

mtcars[c(1,3,10), c(2, 4:5) ]

##            cyl  hp drat
## Mazda RX4    6 110 3.90
## Datsun 710   4  93 3.85
## Merc 280     6 123 3.92

Subset based on logical criteria

When looking to create subsets based on a logical condition, you can use the subset() function.

subset(mtcars, hp < 90)

##                 mpg cyl  disp hp drat    wt  qsec vs am gear carb
## Merc 240D      24.4   4 146.7 62 3.69 3.190 20.00  1  0    4    2
## Fiat 128       32.4   4  78.7 66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4  75.7 52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4  71.1 65 4.22 1.835 19.90  1  1    4    1
## Fiat X1-9      27.3   4  79.0 66 4.08 1.935 18.90  1  1    4    1

Note that:

the first argument you pass to subset() is the name of the dataframe you want to subset
you shouldn’t use quotes around hp
you should use the traditional comparators to contrusct your conditions: ==, >, etc.

How to sort a data.frame?

To sort a data.frame, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

#order the cars using the column mpg (ascending order)
mtcars[order(mtcars$mpg),]

##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1

#sort the cars using two keys
mtcars[order(mtcars$mpg, -mtcars$cyl),]

##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1

Note that you need to repeat the mtcars$ part within the order function; not doing so will produce an error, because the order function will expect a variable that does not exist.

mtcars[order(mpg),]

## Error in order(mpg): object 'mpg' not found

How to quickly summarize the data

The statistical summary and nature of the data can be obtained by applying summary() function.

summary(carsub)

##       disp            qsec            carb      
##  Min.   : 71.1   Min.   :14.50   Min.   :1.000  
##  1st Qu.:120.8   1st Qu.:16.89   1st Qu.:2.000  
##  Median :196.3   Median :17.71   Median :2.000  
##  Mean   :230.7   Mean   :17.85   Mean   :2.812  
##  3rd Qu.:326.0   3rd Qu.:18.90   3rd Qu.:4.000  
##  Max.   :472.0   Max.   :22.90   Max.   :8.000

Other classes of data frames

Tibbles

Tibble is a new class of data frames (developed by RStudio). They are also capturing data in tabular forms, and have internal features that make working with some packages a little easier.

You can work with tibbles by installing the suite of packages tidyverse or the specific package tibble.

# install a suite of related packages
install.packages("tidyverse")

# Alternatively, install just tibble:
install.packages("tibble")

Then load the package tibble

library(tibble)

Creation

Like data.frame, you can create a new tibble from individual vectors with tibble().

However, there are a few differences:

tibble() will:

keep strings as characters (and not convert them automatically into factors)
allows you to refer to variables that you just created
automatically recycle inputs of length 1

tibble(
  x = 1:5, 
  y = 1,    # this will be automatically transformed into a vector with length 5
  z = x^2 + y,  # it will recognize x and y that were just created
  t = letters[1:5]   # strings will not be converted into factors
)

## # A tibble: 5 × 4
##       x     y     z t    
##   <int> <dbl> <dbl> <chr>
## 1     1     1     2 a    
## 2     2     1     5 b    
## 3     3     1    10 c    
## 4     4     1    17 d    
## 5     5     1    26 e

It also checks that columns have the same lengths (except, as seen above for unique values)

tibble(
  x = 1:5, 
  t = letters[1:3]
)

## Error in `tibble()`:
## ! Tibble columns must have compatible sizes.
## • Size 5: Existing data.
## • Size 3: Column `t`.
## ℹ Only values of size one are recycled.

Display

With large data frames, it will show only the first 10 rows, and all the columns that fit on screen and in addition to its name, each column reports its type.

ans <- tibble(
  x = 1:50, 
  t = letters[1:50],
  x2 = x^2,
  e = sample(letters, 50, replace = TRUE)
)
ans

## # A tibble: 50 × 4
##        x t        x2 e    
##    <int> <chr> <dbl> <chr>
##  1     1 a         1 v    
##  2     2 b         4 l    
##  3     3 c         9 i    
##  4     4 d        16 x    
##  5     5 e        25 u    
##  6     6 f        36 m    
##  7     7 g        49 d    
##  8     8 h        64 b    
##  9     9 i        81 f    
## 10    10 j       100 w    
## # … with 40 more rows

Subsetting

Compared to a data.frame, tibbles are stricter; in particular they never do partial matching, and they will generate a warning if the column you are trying to access does not exist.

Compare the two codes:

df <- data.frame(
  x1 = runif(5),
  y1 = rnorm(5),
  y2 = rnorm(5)
)
df$x # Extract by name

## [1] 0.4185595 0.7090068 0.8417981 0.4979531 0.7715919

df$y

## NULL

df[, 1] # Extract by position

## [1] 0.4185595 0.7090068 0.8417981 0.4979531 0.7715919

df <- tibble(
  x1 = runif(5),
  y1 = rnorm(5),
  y2 = rnorm(5)
)
df$x # Extract by name

## Warning: Unknown or uninitialised column: `x`.

## NULL

df$x1

## [1] 0.82821475 0.02363167 0.99581781 0.88277135 0.64373937

df$y

## Warning: Unknown or uninitialised column: `y`.

## NULL

df[, 1 ] # Extract by position

## # A tibble: 5 × 1
##       x1
##    <dbl>
## 1 0.828 
## 2 0.0236
## 3 0.996 
## 4 0.883 
## 5 0.644

data.table

data.table is a package that extends data.frames. Two of its most notable features are speed and cleaner syntax. I will give much more details about data.table in a separate section.

Exercises

Exercise 1:

Create the following vectors corresponding to Name, Age, Height, Weigth and Sex. Make sure Sex is a factor.

## [1] 25 31 23 52 76 49 26

## [1] 177 163 190 179 163 183 164

## [1] 57 69 83 75 70 83 53

## [1] F F M M F M F
## Levels: F M

Using these vectors, create the following data frame. (Be careful the vector Name must be used for the creation of row names)

##         Age Height Weight Sex
## Alex     25    177     57   F
## Moses    31    163     69   F
## Stephan  23    190     83   M
## Zakhele  52    179     75   M
## Leane    76    163     70   F
## Lucas    49    183     83   M
## Nobhule  26    164     53   F

Show the answer

Question 1:

Name <- c("Alex", "Moses", "Stephan", "Zakhele", "Leane", "Lucas", "Nobhule")
Age <- c(25, 31, 23, 52, 76, 49, 26)
Height <- c(177, 163, 190, 179, 163, 183, 164)
Weight <- c(57, 69, 83, 75, 70, 83, 53)
Sex <- as.factor(c("F", "F", "M", "M", "F", "M", "F"))
Age

## [1] 25 31 23 52 76 49 26

Height

## [1] 177 163 190 179 163 183 164

Weight

## [1] 57 69 83 75 70 83 53

Sex

## [1] F F M M F M F
## Levels: F M

Question 2:

df <- data.frame (row.names = Name, Age, Height, Weight, Sex)
df

##         Age Height Weight Sex
## Alex     25    177     57   F
## Moses    31    163     69   F
## Stephan  23    190     83   M
## Zakhele  52    179     75   M
## Leane    76    163     70   F
## Lucas    49    183     83   M
## Nobhule  26    164     53   F

Exercise 2

Using the dataframe mtcars, create a new data frame that:

only includes cars with a horse power (hp) greater or equal to 120 and lower or equal to 200,
only contains the columns disp, drat and hp,
is sorted by decreasing hp

Show the answer

sub <- subset(mtcars[, c("disp", "drat", "hp")], hp >= 120 & hp <=200)
sub2 <- sub[order(-sub$hp),]
sub2

##                    disp drat  hp
## Merc 450SE        275.8 3.07 180
## Merc 450SL        275.8 3.07 180
## Merc 450SLC       275.8 3.07 180
## Hornet Sportabout 360.0 3.15 175
## Pontiac Firebird  400.0 3.08 175
## Ferrari Dino      145.0 3.62 175
## Dodge Challenger  318.0 2.76 150
## AMC Javelin       304.0 3.15 150
## Merc 280          167.6 3.92 123
## Merc 280C         167.6 3.92 123

Last updated on Mar 25, 2023