New Variables

We might want to create new variables because we want to

add extra value for missing values, e.g. NA
create quantitavive values, for example from numbers
do transformation

Get some data to summarize

We will use data from the following page: https://data.baltimorecity.gov/Culture-Arts/Restaurants/k5ry-ef3g

Run this code in order to download sample data for this tutorial.

if (!file.exists("./data")) { dir.create("./data") }
url <- "https://data.baltimorecity.gov/api/views/k5ry-ef3g/rows.csv?accessType=DOWNLOAD"

download.file(url, destfile="./data/restaurants.csv", method="curl")

data <- read.csv("./data/restaurants.csv")

How to create a sequence

Sometimes, we might need to create a sequence.

> seq(1, 10, by=2)

[1] 1 3 5 7 9

> seq(1, 10, length=3)

[1]  1.0  5.5 10.0

Create new column by subsetting.

> data$nearMe = data$neighborhood %in% c("Roland Park", "Homeland")

> table(data$nearMe)

FALSE  TRUE
 1314    13

Create binary column

> data$wrongZip = ifelse(data$zipCode < 0, TRUE, FALSE)

> table(data$wrongZip)

FALSE  TRUE
 1326     1

Create categorical values

The following code will create categorize zipCode values into percentile 0-25, 25-50, 50-75 and 75-100.

data$zipGroups = cut(data$zipCode,breaks=quantile(data$zipCode))

table(data$zipGroups)
table(data$zipGroups, data$zipCode)

Easier way to do that using Hmisc package.

install.packages("Hmisc")
library(Hmisc)

data$zipGroups = cut2(data$zipCode, g=4)

table(data$zipGroups)

Create factor values

zipCode is integer and we might want to turn it into factor type.

> data$zipCodeFactor <- factor(data$zipCode)

> class(data$zipCodeFactor)
[1] "factor"

Levels of factor variables

> options <- sample(c("yes", "no"), size=10, replace=TRUE)
> options
 [1] "yes" "no"  "no"  "yes" "yes" "no"  "yes" "no"  "yes" "no"

> optionsFactor = factor(options, levels=c("yes", "no"))
> optionsFactor
 [1] yes no  no  yes yes no  yes no  yes no
Levels: yes no

relevel(optionsFactor, ref="yes")

as.numeric(optionsFactor)

Mutate function

Mutate function can be used to add variable to a new table.

install.packages("Hmisc")
library(Hmisc)
install.packages("plyr")
library(plyr)

newData = mutate(data, zipGroups=cut2(zipCode, g=4))

View(newData)

Transformations

# absolute value
abs(x)

# square root
sqrt(x)

# ceiling(3.4) is 4
ceiling(x)

# floor(3.4) is 3
floor(x)

# rounding
round(x, digits=n)

# signif(3.475, digits=2) is 3.5
signif(x, digits=n)

cos(x), sin(x) # etc.

# natural logarithm
log(x)

# other logaritmhs
log2(x), log10(x)

# exponentiating x
exp(x)

More here functions here: http://www.statmethods.net/management/functions.html

PreviousSummarizing NextReshaping

Last updated 5 years ago

Was this helpful?