Monday, 23 May 2011

More R

Another day, another round of new stuff learned about R. I've written a function that will calculate the means of a vector (y) aggregated by factor (z).
> avg = function(y,z) {
var1 = aggregate(y, by = list(z), FUN = mean, na.rm=T)
var2 = var1$x
}
The function returns a vector with the means. Likewise, I wrote a function for the calculation of standard error of the means (SEM):

> sem = function(y,z) {
var1 = aggregate(y, by = list(z), FUN = sd, na.rm=T)
var2 = aggregate(y, by = list(z), FUN = length)
var3 = var1$x / sqrt(var2$x) 
}
So with these functions, I can easily calculate the means + SEM for a large dataset. For example:
> data.means = data.frame(avg(y1,z),avg(y2,z),avg(y3,z),avg(y4,z),avg(y5,z),avg(y6,z))
> data.sems = data.frame(sem(y1,z),sem(y2,z),sem(y3,z),sem(y4,z),sem(y5,z),sem(y6,z))
I can give the rows and columns appropriate headers like so:
> rownames(data.means) = c("Hibernation", "Post-hibernation", "Pre-hibernation")
> colnames(data.means) = c("endof", "dsb", "dmcbc", "dfreef", "csb", "cfreef", "cmcbc")
So the formatted data frame will look like so:

> data.means     endof      dsb    dmcbc    dfreef      csb    cfreef    cmcbc
Hibernation      51.87500 49.06250 0.596250 22.851250 82.16667 11.875000 0.950000
Post-hibernation 33.86667 82.37778 1.233333  9.591111 85.81429  3.020000 1.335714

Pre-hibernation  13.60000 45.46250 0.551250  4.858750 84.30000  1.038333 1.058333
I want to rearrange the data so that Pre-hibernation comes first, then Hibernation and Post-hibernation. I'll first assign a variable with my desired order, then do the rearrangement.
> order = c("Pre-hibernation", "Hibernation", "Post-hibernation")
> data.means = data.means[order, , drop=F]
Now I want to plot all these data in a single bar plot. I'll have to convert the data frames into matrices because barplot() can't use data frames.
> data.means = data.matrix(data.means, rownames.force = T)
> data.sems = data.matrix(data.sems, rownames.force = T)
Now to plot the means, SEM, and add a legend to the top right of the plot:
> means.bp = barplot(data.means, main = "FreeF By State", ylab = "FreeF", xlab = "State", ylim = range(0,120), beside = T, names.arg = colnames(data.means), axis.lty = 1,col = gray.colors(3))
> error.bar(means.bp,data.means,data.sems)
> legend("topright", rownames(data.means), fill=gray.colors(3))
I've added the argument beside = T to the bar plot so that R will plot the data grouped into the columns present in the data matrix. So the arguments col and fill  do practically the same thing in barplot() and legend(), respectively: designate the colour scheme of the bars. In this case, I'm using the preset grayscale colour scheme in R.

No comments:

Post a Comment