Reshaping
The goal is to get tidy data. That means to have:
Each variable forms a column
Each observation forms a row
Each table/file stores data about one kind of observations.
Data to reshape
We will use standard R data set mtcars.
head(mtcars)Install reshape2 package
Nice article about respahe2
install.packages("reshape2")
library(reshape2)Melting data set
We will pass the following variables into melt function:
data set
what columns are IDs
what columns are variables
The following code will add one more column with row names and primarilly, make IDs from "carname", "gear" and "cyl". The remaining columns will be melted down.
mtcars$carname <- rownames(mtcars)
carMelt <- melt(
mtcars,
id=c("carname", "gear", "cyl"),
measure.vars=c("mpg", "hp")
)Let's have a look at the melted values.
> head(carMelt, n=3)
carname gear cyl variable value
1 Mazda RX4 4 6 mpg 21.0
2 Mazda RX4 Wag 4 6 mpg 21.0
3 Datsun 710 4 4 mpg 22.8
...
62 Ferrari Dino 5 6 hp 175
63 Maserati Bora 5 8 hp 335
64 Volvo 142E 4 4 hp 109The better way to see what happend is to display values in table. Watch values in variable column.
View(carMelt)Casting data sets
We can cast the data set into different shapes.
> cylData <- dcast(carMelt, cyl ~ variable)
Aggregation function missing: defaulting to length
> cylData
cyl mpg hp
1 4 11 11
2 6 7 7
3 8 14 14We can specify function to aggregate values.
> cylData <- dcast(carMelt, cyl ~ variable, mean)
> cylData
cyl mpg hp
1 4 26.66364 82.63636
2 6 19.74286 122.28571
3 8 15.10000 209.21429Average Values
We will InsectSprays data set.
> head(InsectSprays)
count spray
1 10 A
2 7 A
3 20 A
4 14 A
5 14 A
6 12 AThe following code is using tapply function to show sum of count column for each spray.
> tapply(InsectSprays$count, InsectSprays$spray, sum)
A B C D E F
174 184 25 59 42 200Split values
> split <- split(InsectSprays$count, InsectSprays$spray)
> split
$A
[1] 10 7 20 14 14 12 10 23 17 20 14 13
$B
[1] 11 17 21 11 16 14 17 17 19 21 7 13
...Split and sum
> splitSum <- lapply(split, sum)
> splitSum
$A
[1] 174
$B
[1] 184
...Combine
We can combine the values craeted above in two ways.
unlist(splitSum)sapply(split, sum)Using plyr package to summarize values
Be careful, there is
summariseand NOTsummarize. If you do that mistake you get argument "by" is missing, with no default error.
ddply(InsectSprays, .(spray), summarise, sum=sum(count))More to lear on http://www.r-bloggers.com/a-fast-intro-to-plyr-for-r
Other functions
acastcasting multi-dimensional arrayarrangefaster reordering without using order() functionmutateadding new variables
Last updated
Was this helpful?