Subsetting
Basics
[] will return object of the same type, can be used to select more than one element.
[[]] will access an element on exact position.
$ will access named variable, for example in a list.
Partial Matching
R will guess name of variable when we use $ operator.
> x <- list(aardvark = 1:5)
> x$a
[1] 1 2 3 4 5
> x[["a"]]
NULL
> x[["a", exact = FALSE]]
[1] 1 2 3 4 5Examples
First we create a table with random values.
data <- data.frame(
"column1"=sample(1:5),
"column2"=sample(6:10),
"column3"=sample(11:15)
)
data <- data[sample(1:5),]
data$column2[c(1,3)] = NAHere is how that table could look like.
> data
column1 column2 column3
5 5 NA 11
2 4 9 15
4 1 NA 12
3 2 6 13
1 3 7 14Return first column by index. When we pass a number, it will return column from that position.
> data[,1]
[1] 5 4 1 2 3Return column by name.
> data[,"column2"]
[1] NA 9 NA 6 7Return column by name and rows by position.
> data[1:2,"column2"]
[1] NA 9Filter by column values using logical operator AND.
> data[(data$column1 >= 5 & data$column3 > 10),]
column1 column2 column3
5 5 NA 11Filter by column values using logical operator OR.
> data[(data$column1 >= 5 | data$column3 > 10),]
column1 column2 column3
5 5 NA 11
2 4 9 15
4 1 NA 12
3 2 6 13
1 3 7 14If tehre is missing value in the data set and we do not want to return it, we have to use which function.
> data[which(data$column2 > 3),]
column1 column2 column3
2 4 9 15
3 2 6 13
1 3 7 14Visually compare what is returned by the query above and the query below, which is not using which function..
> data[(data$column2 > 3),]
column1 column2 column3
NA NA NA NA
2 4 9 15
NA.1 NA NA NA
3 2 6 13
1 3 7 14Sort values by column.
sort(data$column1, decreasing=FALSE)
[1] 1 2 3 4 5Sort values by column and place "NA" values at the end.
> sort(data$column2, decreasing=FALSE, na.last=TRUE)
[1] 6 7 9 NA NAOrder data by a column.
> data[order(data$column1),]
column1 column2 column3
4 1 NA 12
3 2 6 13
1 3 7 14
2 4 9 15
5 5 NA 11Ordering with plyr library.
> library(plyr)
> arrange(data, column1)
column1 column2 column3
1 1 NA 12
2 2 6 13
3 3 7 14
4 4 9 15
5 5 NA 11
> arrange(data, desc(column1))
column1 column2 column3
1 5 NA 11
2 4 9 15
3 3 7 14
4 2 6 13
5 1 NA 12Adding columns
data$column4 <- rnorm(5)or
data <- cbind(data, rnorm(5))Last updated
Was this helpful?