Editing text
Rules for editing column names:
All lower case when possible
Name should be descriptive, not shortcuts
No duplicates
No underscores, dots or white spaces
Rules for values:
Should be factors
Should be descrpitive, like TRUE/FALSE instead of 1/0.
Get some data to edit
We will use data from the following page: https://data.baltimorecity.gov/Transportation/Baltimore-Fixed-Speed-Cameras/dz54-2aru
Run this code in order to download sample data for this tutorial.
if (!file.exists("./data")) { dir.create("./data") }
url <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(url, destfile="./data/cameras.csv", method="curl")
data <- read.csv("./data/cameras.csv")
names(data)
Lower or upper case
tolower(names(data))
toupper(names(data))
Split values
For example, split word when there is a dot.
strsplit(names(data), "\\.")
Get only first element
In this example, we will split a string and get only first element.
firstElement <- function (x) { unlist(strsplit(x, ".", fixed=TRUE))[1] }
sapply(as.list(names(data)), firstElement)
Replace characters
The following code will replace all _
by empty space.
sub("_", " ", names(reviews))
sub
function will always replace only first found characeter. In order to replace all, use gsub
.
> test <- "test_value_1"
> sub("_", " ", test)
[1] "test value_1"
> gsub("_", " ", test)
[1] "test value 1"
Searching
Search for a string in a column.
> grep("Alameda", data$intersection)
[1] 4 5 36
How many times a string appears in column.
> table(grepl("Alameda", data$intersection))
FALSE TRUE
77 3
More about
grep
function is here.
More string functions
We can use stringr package to easier work with strings.
install.packages("stringr")
library(stringr)
Get number of characters.
> nchar("Hello there")
[1] 11
Get sub string.
substr("Hello there", 1,5)
Connect strings into one.
> paste("Hello", "there")
[1] "Hello there"
> paste0("Hello", "there")
[1] "Hellothere"
Trim string.
str_trim(" Hello ")
Last updated
Was this helpful?