Editing text

Rules for editing column names:

  • All lower case when possible

  • Name should be descriptive, not shortcuts

  • No duplicates

  • No underscores, dots or white spaces

Rules for values:

  • Should be factors

  • Should be descrpitive, like TRUE/FALSE instead of 1/0.

Get some data to edit

We will use data from the following page: https://data.baltimorecity.gov/Transportation/Baltimore-Fixed-Speed-Cameras/dz54-2aru

Run this code in order to download sample data for this tutorial.

if (!file.exists("./data")) { dir.create("./data") }
url <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"

download.file(url, destfile="./data/cameras.csv", method="curl")

data <- read.csv("./data/cameras.csv")

names(data)

Lower or upper case

Split values

For example, split word when there is a dot.

Get only first element

In this example, we will split a string and get only first element.

Replace characters

The following code will replace all _ by empty space.

sub function will always replace only first found characeter. In order to replace all, use gsub.

Searching

Search for a string in a column.

How many times a string appears in column.

More about grep function is here.

More string functions

We can use stringr package to easier work with strings.

Get number of characters.

Get sub string.

Connect strings into one.

Trim string.

Last updated

Was this helpful?