HDF5

What is HDF5.

  • Used for storing large data sets

  • Supports large range of data types

  • Can be used to optimize reading and writing from disc in R

Why HDF5 and not just Hadoop? In short HDF5 is a smart data container and HDFS is a file system. Longer version is here.

Install pacakge

More details about rhdf5 package. Here is more details about provided functions.

source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")

library(rhdf5)

Create h5 file

In order to create a file use h5createFile function.

created = h5createFile("example.h5")

Create file in a group.

createdFile = h5createFile("example.h5")

createdGroup = h5createGroup("example.h5", "group1")

List h5 files.

h5ls("example.h5")

Write data into h5 file

Here is an example how to write a matrix into h5 file.

library(data.table)

matrixFile = h5createFile("matrix.h5")

A = matrix(1:10, nr=5, nc=2)
h5write(A, "matrix.h5", "a matrix")

You might need to run H5close() after experimenting with h5 files.

Here is an example how to write an array into h5 file.

library(data.table)

arrayFile = h5createFile("array.h5")

B = array(seq(0.1, 0.2, by=0.1), dim=c(5,2,2))

h5write(B, "array.h5", "an array")

Write a data set

An example how to write data.table into h5 file.

library(data.table)

years = c(2012, 2013)
average = c(250, 275)
table.values <- data.table(year = years, averageBeerConsumption = average)

datasetFile = h5createFile("dataset.h5")
h5write(table.values, "dataset.h5", "data set")

We can also add an attribute (or rather call it metadata) as follows. We add date when the table has been created.

attr(table.values, "created") <- date()

Read data from h5 file

h5read("array.h5", "data set")

Last updated