Subsetting

Basics

[] will return object of the same type, can be used to select more than one element.

[[]] will access an element on exact position.

$ will access named variable, for example in a list.

Partial Matching

R will guess name of variable when we use $ operator.

> x <- list(aardvark = 1:5)
> x$a
[1] 1 2 3 4 5
> x[["a"]]
NULL
> x[["a", exact = FALSE]]
[1] 1 2 3 4 5

Examples

First we create a table with random values.

data <- data.frame(
    "column1"=sample(1:5),
    "column2"=sample(6:10),
    "column3"=sample(11:15)
)

data <- data[sample(1:5),]

data$column2[c(1,3)] = NA

Here is how that table could look like.

> data
  column1 column2 column3
5       5      NA      11
2       4       9      15
4       1      NA      12
3       2       6      13
1       3       7      14

Return first column by index. When we pass a number, it will return column from that position.

> data[,1]

[1] 5 4 1 2 3

Return column by name.

> data[,"column2"]

[1] NA  9 NA  6  7

Return column by name and rows by position.

> data[1:2,"column2"]

[1] NA  9

Filter by column values using logical operator AND.

> data[(data$column1 >= 5 & data$column3 > 10),]

  column1 column2 column3
5       5      NA      11

Filter by column values using logical operator OR.

> data[(data$column1 >= 5 | data$column3 > 10),]

  column1 column2 column3
5       5      NA      11
2       4       9      15
4       1      NA      12
3       2       6      13
1       3       7      14

If tehre is missing value in the data set and we do not want to return it, we have to use which function.

> data[which(data$column2 > 3),]

  column1 column2 column3
2       4       9      15
3       2       6      13
1       3       7      14

Visually compare what is returned by the query above and the query below, which is not using which function..

> data[(data$column2 > 3),]

     column1 column2 column3
NA        NA      NA      NA
2          4       9      15
NA.1      NA      NA      NA
3          2       6      13
1          3       7      14

Sort values by column.

sort(data$column1, decreasing=FALSE)

[1] 1 2 3 4 5

Sort values by column and place "NA" values at the end.

> sort(data$column2, decreasing=FALSE, na.last=TRUE)

[1]  6  7  9 NA NA

Order data by a column.

> data[order(data$column1),]

  column1 column2 column3
4       1      NA      12
3       2       6      13
1       3       7      14
2       4       9      15
5       5      NA      11

Ordering with plyr library.

> library(plyr)
> arrange(data, column1)

  column1 column2 column3
1       1      NA      12
2       2       6      13
3       3       7      14
4       4       9      15
5       5      NA      11

> arrange(data, desc(column1))

  column1 column2 column3
1       5      NA      11
2       4       9      15
3       3       7      14
4       2       6      13
5       1      NA      12

Adding columns

data$column4 <- rnorm(5)

or

data <- cbind(data, rnorm(5))

Last updated