New Variables

We might want to create new variables because we want to

  • add extra value for missing values, e.g. NA

  • create quantitavive values, for example from numbers

  • do transformation

Get some data to summarize

We will use data from the following page: https://data.baltimorecity.gov/Culture-Arts/Restaurants/k5ry-ef3g

Run this code in order to download sample data for this tutorial.

if (!file.exists("./data")) { dir.create("./data") }
url <- "https://data.baltimorecity.gov/api/views/k5ry-ef3g/rows.csv?accessType=DOWNLOAD"

download.file(url, destfile="./data/restaurants.csv", method="curl")

data <- read.csv("./data/restaurants.csv")

How to create a sequence

Sometimes, we might need to create a sequence.

> seq(1, 10, by=2)

[1] 1 3 5 7 9

Create new column by subsetting.

Create binary column

Create categorical values

The following code will create categorize zipCode values into percentile 0-25, 25-50, 50-75 and 75-100.

Easier way to do that using Hmisc package.

Create factor values

zipCode is integer and we might want to turn it into factor type.

Levels of factor variables

Mutate function

Mutate function can be used to add variable to a new table.

Transformations

More here functions here: http://www.statmethods.net/management/functions.html

Last updated

Was this helpful?