HDF5
What is HDF5.
Used for storing large data sets
Supports large range of data types
Can be used to optimize reading and writing from disc in R
Why HDF5 and not just Hadoop? In short HDF5 is a smart data container and HDFS is a file system. Longer version is here.
Install pacakge
More details about rhdf5 package. Here is more details about provided functions.
source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")
library(rhdf5)
Create h5 file
In order to create a file use h5createFile
function.
created = h5createFile("example.h5")
Create file in a group.
createdFile = h5createFile("example.h5")
createdGroup = h5createGroup("example.h5", "group1")
List h5 files.
h5ls("example.h5")
Write data into h5 file
Here is an example how to write a matrix into h5 file.
library(data.table)
matrixFile = h5createFile("matrix.h5")
A = matrix(1:10, nr=5, nc=2)
h5write(A, "matrix.h5", "a matrix")
You might need to run H5close() after experimenting with h5 files.
Here is an example how to write an array into h5 file.
library(data.table)
arrayFile = h5createFile("array.h5")
B = array(seq(0.1, 0.2, by=0.1), dim=c(5,2,2))
h5write(B, "array.h5", "an array")
Write a data set
An example how to write data.table
into h5 file.
library(data.table)
years = c(2012, 2013)
average = c(250, 275)
table.values <- data.table(year = years, averageBeerConsumption = average)
datasetFile = h5createFile("dataset.h5")
h5write(table.values, "dataset.h5", "data set")
We can also add an attribute (or rather call it metadata) as follows. We add date when the table has been created.
attr(table.values, "created") <- date()
Read data from h5 file
h5read("array.h5", "data set")
Last updated
Was this helpful?