# Larger Datasets

`read.csv` and similar function will read all the data into RAM memory. That will cause issues when the data is bigger and it does not fit into the memory.

## Calculating Memory Requirements

Example, where one cell in a table takes 8 bytes, which might be size of a number.

```r
> rows <- 1500000
> columns <- 120
> bytes <- 8
> megaBytes <- (rows * columns * bytes) / 2^20
> megaBytes
[1] 1373.291

> gigaBytes <- (rows * columns * bytes) / 1024^3
> gigaBytes
[1] 1.341105
```

Then make sure you have twice more memory then calculated. So, for 1.3GB you should have 2.6GB.

## Speed up loading

Tell what types are in each columns. Estimate how many rows is in the file.

```r
initial <- read.table("data.txt", nrows=100)
classes <- sapply(initial, class)
data <- read.table("data.txt", colClasses = classes)
```

## Larger than Large Datasets

If your data do not fit into RAM memory use [colbycol](http://colbycol.r-forge.r-project.org) or [bigmemory](http://cran.r-project.org/web/packages/bigmemory/index.html) package. It will internally split table into columns where each column is stored in separate file. Then it will operate over these files and that will not require huge RAM memory on your computer or server.

Read [Handling Large Datasets in R](http://www.r-bloggers.com/handling-large-datasets-in-r) blog post to lear more about reading large datasets.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ondrej-kvasnovsky-2.gitbook.io/handbook-of-hidden-data-scientist/reading_and_writing/large_datasets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
