Editing text

Rules for editing column names:

  • All lower case when possible

  • Name should be descriptive, not shortcuts

  • No duplicates

  • No underscores, dots or white spaces

Rules for values:

  • Should be factors

  • Should be descrpitive, like TRUE/FALSE instead of 1/0.

Get some data to edit

We will use data from the following page: https://data.baltimorecity.gov/Transportation/Baltimore-Fixed-Speed-Cameras/dz54-2aruarrow-up-right

Run this code in order to download sample data for this tutorial.

if (!file.exists("./data")) { dir.create("./data") }
url <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"

download.file(url, destfile="./data/cameras.csv", method="curl")

data <- read.csv("./data/cameras.csv")

names(data)

Lower or upper case

Split values

For example, split word when there is a dot.

Get only first element

In this example, we will split a string and get only first element.

Replace characters

The following code will replace all _ by empty space.

sub function will always replace only first found characeter. In order to replace all, use gsub.

Searching

Search for a string in a column.

How many times a string appears in column.

More about grep function is herearrow-up-right.

More string functions

We can use stringr packagearrow-up-right to easier work with strings.

Get number of characters.

Get sub string.

Connect strings into one.

Trim string.

Last updated