Data Table
Facts about data.table
.
Extends from
data.frame
and therefore should provide the same API.Is written in C and is really fast.
Much faster at subsetting, grouping and updating.
Hello world
install.packages("data.table")
library(data.table)
years = c(2012, 2013)
average = c(250, 275)
table.values <- data.table(year = years, averageBeerConsumption = average)
See all data.table
tables created in memory.
tables()
Subsetting rows.
Access row on specific index.
table.values[2]
table.values[c(1,2)]
Access rows that fulfil a condition.
table.values[table.values$year==2012]
Calculate values from columns
table.values[, sum(averageBeerConsumption)]
table.values[, list(mean(year), sum(averageBeerConsumption))]
Return table of values for a column
table.values[, table(year)]
Add new column
table.values[, volume:=averageBeerConsumption*0.5]
Multiple operations.
table.values[,
x:={temp <- averageBeerConsumption*year;
log2(temp)
}]
Plyr like operations
table.values[, y:= year<2013]
Grouping by
table.values[, sum:= sum(averageBeerConsumption), by= year]
Count number of occurrences
table.values[, .N, by=year]
Keys
Making table faster by setting the keys
setkey(table.values, year)
Then we can join tables by keys.
setkey(table1.values, year)
setkey(table2.values, year)
merge(table1.values, table2.values)
Fast reading
First we create a file that we can use to test speed of reading.
big.file <- data.frame(x=rnorm(1E6), y=rnorm(1E6))
file <- tempfile()
write.table(big.file, file=file, row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)
Slow approach using read.table
function.
system.time(read.table(file, header=TRUE, sep="\t"))
Faster approach using fread
function.
system.time(fread(file))
Last updated
Was this helpful?