Editing text
Rules for editing column names:
All lower case when possible
Name should be descriptive, not shortcuts
No duplicates
No underscores, dots or white spaces
Rules for values:
Should be factors
Should be descrpitive, like TRUE/FALSE instead of 1/0.
Get some data to edit
We will use data from the following page: https://data.baltimorecity.gov/Transportation/Baltimore-Fixed-Speed-Cameras/dz54-2aru
Run this code in order to download sample data for this tutorial.
if (!file.exists("./data")) { dir.create("./data") }
url <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(url, destfile="./data/cameras.csv", method="curl")
data <- read.csv("./data/cameras.csv")
names(data)Lower or upper case
tolower(names(data))
toupper(names(data))Split values
For example, split word when there is a dot.
strsplit(names(data), "\\.")Get only first element
In this example, we will split a string and get only first element.
firstElement <- function (x) { unlist(strsplit(x, ".", fixed=TRUE))[1] }
sapply(as.list(names(data)), firstElement)Replace characters
The following code will replace all _ by empty space.
sub("_", " ", names(reviews))sub function will always replace only first found characeter. In order to replace all, use gsub.
> test <- "test_value_1"
> sub("_", " ", test)
[1] "test value_1"
> gsub("_", " ", test)
[1] "test value 1"Searching
Search for a string in a column.
> grep("Alameda", data$intersection)
[1] 4 5 36How many times a string appears in column.
> table(grepl("Alameda", data$intersection))
FALSE TRUE
77 3More about
grepfunction is here.
More string functions
We can use stringr package to easier work with strings.
install.packages("stringr")
library(stringr)Get number of characters.
> nchar("Hello there")
[1] 11Get sub string.
substr("Hello there", 1,5)Connect strings into one.
> paste("Hello", "there")
[1] "Hello there"
> paste0("Hello", "there")
[1] "Hellothere"Trim string.
str_trim(" Hello ")Last updated
Was this helpful?