# Editing text

Rules for editing column names:

* All lower case when possible
* Name should be descriptive, not shortcuts
* No duplicates
* No underscores, dots or white spaces

Rules for values:

* Should be factors
* Should be descrpitive, like TRUE/FALSE instead of 1/0.

## Get some data to edit

We will use data from the following page: <https://data.baltimorecity.gov/Transportation/Baltimore-Fixed-Speed-Cameras/dz54-2aru>

Run this code in order to download sample data for this tutorial.

```r
if (!file.exists("./data")) { dir.create("./data") }
url <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"

download.file(url, destfile="./data/cameras.csv", method="curl")

data <- read.csv("./data/cameras.csv")

names(data)
```

## Lower or upper case

```r
tolower(names(data))
toupper(names(data))
```

## Split values

For example, split word when there is a dot.

```r
strsplit(names(data), "\\.")
```

## Get only first element

In this example, we will split a string and get only first element.

```r
firstElement <- function (x) { unlist(strsplit(x, ".", fixed=TRUE))[1] }

sapply(as.list(names(data)), firstElement)
```

## Replace characters

The following code will replace all `_` by empty space.

```r
sub("_", " ", names(reviews))
```

`sub` function will always replace only first found characeter. In order to replace all, use `gsub`.

```r
> test <- "test_value_1"

> sub("_", " ", test)
[1] "test value_1"

> gsub("_", " ", test)
[1] "test value 1"
```

## Searching

Search for a string in a column.

```r
> grep("Alameda", data$intersection)
[1]  4  5 36
```

How many times a string appears in column.

```r
> table(grepl("Alameda", data$intersection))

FALSE  TRUE
   77     3
```

> More about `grep` function is [here](http://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html).

## More string functions

We can use [stringr package](http://cran.r-project.org/web/packages/stringr/index.html) to easier work with strings.

```r
install.packages("stringr")
library(stringr)
```

Get number of characters.

```r
> nchar("Hello there")
[1] 11
```

Get sub string.

```r
substr("Hello there", 1,5)
```

Connect strings into one.

```r
> paste("Hello", "there")
[1] "Hello there"

> paste0("Hello", "there")
[1] "Hellothere"
```

Trim string.

```r
str_trim("   Hello   ")
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ondrej-kvasnovsky-2.gitbook.io/handbook-of-hidden-data-scientist/cleaning_data/editing_text.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
