HDF5

What is HDF5.

  • Used for storing large data sets

  • Supports large range of data types

  • Can be used to optimize reading and writing from disc in R

Why HDF5 and not just Hadoop? In short HDF5 is a smart data container and HDFS is a file system. Longer version is here.

Install pacakge

More details about rhdf5 package. Here is more details about provided functions.

source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")

library(rhdf5)

Create h5 file

In order to create a file use h5createFile function.

created = h5createFile("example.h5")

Create file in a group.

createdFile = h5createFile("example.h5")

createdGroup = h5createGroup("example.h5", "group1")

List h5 files.

Write data into h5 file

Here is an example how to write a matrix into h5 file.

You might need to run H5close() after experimenting with h5 files.

Here is an example how to write an array into h5 file.

Write a data set

An example how to write data.table into h5 file.

We can also add an attribute (or rather call it metadata) as follows. We add date when the table has been created.

Read data from h5 file

Last updated

Was this helpful?