Monday, 5 September 2011

Procrastination Busting

So with the copious amounts of work I need to do this month in mind, I decided to look into various strategies to increase my focus and decrease my ADD-fueled procrastination.

I'm a big fan of the Pomodoro technique, wherein you study/work for 25 minutes, then take a 5 minute break. I downloaded the Pomodoro app (http://pomodoro.ugolandini.com/), which seems to be a cool utility that times the intervals and takes statistics on your productivity and pledged to myself to use it. I used to use Focus Booster (http://www.focusboosterapp.com/) for timing purposes, but it was a little too bare bones for my liking.

One big time sink was reading my RSS feeds a couple times an hour. I decided to delete many feeds, especially the ones that were just spamming a lot of random, uninteresting crap (Engadget, I'm looking at you), and kept just a select few. I also unpinned and closed my Google Reader tab.

Finally, I downloaded a program called Isolator (http://willmore.eu/software/isolator/), which just obfuscates windows in the background. Apparently, it helps to maintain focus. I'll report back on its effectiveness.

Friday, 19 August 2011

Sunday, 12 June 2011

Rib Rub Recipe

I haven't updated in a while, but that's more a function of me doing little else but hormone saturation assays for the past couple weeks. I did manage to invent a rub for chicken and ribs:
2 tablespoons garlic powder
1 tablespoon chili powder
1.5 tablespoons Sriracha sauce
1.5 tablespoons sesame oil
1 tablespoon salt
1 tablespoon pepper

It's pretty fantastic, if I say so myself.

Monday, 30 May 2011

Non-linear Regression

The rate at which a protein – with one binding site – binds to its ligand is related to the concentrations of the free and bound receptors, and to the concentration of the free ligand. The equation to describe these relationships is similar to the Michaelis-Menten equation, which describes enzyme kinematics. The binding kinetics equation is:
Where SB is defined as specific binding of the ligand to the protein, Bmax is the total protein concentration (or free and bound protein concentration), [L] is the free ligand concentration, and Kd is the dissociation rate constant (the ratio between the backward and forward reaction rate constants).


Usually, Kd is the one unknown variable in the equation that is of interest to me. To determine its value, I'll run a saturation binding assay in which I mix the protein with a dilution series of its ligand and measure the amount of protein-ligand complexes. The Kd can then be determined by Scatchard plotting or non-linear regression, the latter method of which is considered to be the gold standard.


I work with a protein called corticosteroid binding globulin (aka transcortin, CBG, and serpin peptidase inhibitor clade A member 6). One of its ligands in most mammals is cortisol, and the interplay between this protein-ligand pair plays significant roles in modulating the stress response, the immune system, and energy balance, amongst other systems. I use tritiated cortisol as a tracer molecule in a saturation binding assay to determine the binding kinematics of cortisol to CBG.


After running the assay, I get specific binding data in counts (of beta decays) per minute dependent on the free ligand concentration. An example data set is shown:
Dilution SB     LA        1134   46.28B        798    22.48C        656    10.95D        493    5.24E        305    2.50F        198    1.25
A plot of the data (plot()) shows an exponential relationship between SB and [L].




In a Scatchard plot, I plot specific binding per free ligand concentration on the y-axis against specific binding on the x-axis. This transformation results in a roughly linear plot that is suitable for linear regression analysis.
I'll use the lm() command to run a linear regression and abline() to plot it on the above scatterplot. These are the results:



Call:lm(formula = sbl.518 ~ sb.518.3, data = dfl.518)
Residuals:      1       2       3       4       5       6  19.697 -17.888 -14.008  -3.400  -2.666  18.264 
Coefficients:             Estimate Std. Error t value Pr(>|t|)    (Intercept) 168.76344   15.59183  10.824 0.000413 ***sb.518.3     -0.14458    0.02312  -6.253 0.003335 ** ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Residual standard error: 17.72 on 4 degrees of freedomMultiple R-squared: 0.9072, Adjusted R-squared: 0.884 F-statistic:  39.1 on 1 and 4 DF,  p-value: 0.003335 

The absolute value of the slope of the line (the sb.518.3 coefficient) is inversely proportional to the Kd of CBG, or Kd = 1/0.1445 = 6.92. I can also calculate Bmax as the y-axis intercept (Intercept) divided by the absolute value of the slope. The adjusted R-squared value is not very high (0.884), so this suggests to me that this Kd is a ballpark estimate of the real Kd.

The other way to determine Kd is by non-linear regression. This method is considered to be 'better' than Scatchard analysis because I can define the equation of the line to be fitted to my data, rather than transforming my data to fit a linear line. This can be done in R using the nls() command:
nls.518 <- nls(SB ~ Bmax * L / (Kd + L), data = dfl.518, start = list(Bmax = 1167, Kd = 6.92)
So in the nls() command, I define the equation to be used to fit my data as SB ~ Bmax * L / (Kd + L), and I also define starting values for Bmax and Kd as the values I derived from my Scatchard analysis. R needs these values as the starting point to its analysis. My results are:
Formula: SB ~ Bmax * L / (Kd + L)
Parameters:
            Estimate Std. Error t value Pr(>|t|)  
Bmax         982.2157    17.2532   56.93 1.19e-05 ***
kd             5.3959     0.2556   21.11 0.000233 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.391 on 3 degrees of freedom
Number of iterations to convergence: 3
Achieved convergence tolerance: 5.493e-06 
So by using non-linear regression analysis, I've determined the Kd and Bmax to be 5.40 and 982.2, respectively, which are quite different than the values I obtained using Scatchard plotting. The p-values are highly significant, and the residuals are fantastic.

So what does it all mean? One practical interpretation of Kd is that at a ligand concentration equal to the Kd, half of the receptors are bound to its ligand. The lower the Kd, the higher the affinity of the receptor to its ligand, and vice versa.

Monday, 23 May 2011

More R

Another day, another round of new stuff learned about R. I've written a function that will calculate the means of a vector (y) aggregated by factor (z).
> avg = function(y,z) {
var1 = aggregate(y, by = list(z), FUN = mean, na.rm=T)
var2 = var1$x
}
The function returns a vector with the means. Likewise, I wrote a function for the calculation of standard error of the means (SEM):

> sem = function(y,z) {
var1 = aggregate(y, by = list(z), FUN = sd, na.rm=T)
var2 = aggregate(y, by = list(z), FUN = length)
var3 = var1$x / sqrt(var2$x) 
}
So with these functions, I can easily calculate the means + SEM for a large dataset. For example:
> data.means = data.frame(avg(y1,z),avg(y2,z),avg(y3,z),avg(y4,z),avg(y5,z),avg(y6,z))
> data.sems = data.frame(sem(y1,z),sem(y2,z),sem(y3,z),sem(y4,z),sem(y5,z),sem(y6,z))
I can give the rows and columns appropriate headers like so:
> rownames(data.means) = c("Hibernation", "Post-hibernation", "Pre-hibernation")
> colnames(data.means) = c("endof", "dsb", "dmcbc", "dfreef", "csb", "cfreef", "cmcbc")
So the formatted data frame will look like so:

> data.means     endof      dsb    dmcbc    dfreef      csb    cfreef    cmcbc
Hibernation      51.87500 49.06250 0.596250 22.851250 82.16667 11.875000 0.950000
Post-hibernation 33.86667 82.37778 1.233333  9.591111 85.81429  3.020000 1.335714

Pre-hibernation  13.60000 45.46250 0.551250  4.858750 84.30000  1.038333 1.058333
I want to rearrange the data so that Pre-hibernation comes first, then Hibernation and Post-hibernation. I'll first assign a variable with my desired order, then do the rearrangement.
> order = c("Pre-hibernation", "Hibernation", "Post-hibernation")
> data.means = data.means[order, , drop=F]
Now I want to plot all these data in a single bar plot. I'll have to convert the data frames into matrices because barplot() can't use data frames.
> data.means = data.matrix(data.means, rownames.force = T)
> data.sems = data.matrix(data.sems, rownames.force = T)
Now to plot the means, SEM, and add a legend to the top right of the plot:
> means.bp = barplot(data.means, main = "FreeF By State", ylab = "FreeF", xlab = "State", ylim = range(0,120), beside = T, names.arg = colnames(data.means), axis.lty = 1,col = gray.colors(3))
> error.bar(means.bp,data.means,data.sems)
> legend("topright", rownames(data.means), fill=gray.colors(3))
I've added the argument beside = T to the bar plot so that R will plot the data grouped into the columns present in the data matrix. So the arguments col and fill  do practically the same thing in barplot() and legend(), respectively: designate the colour scheme of the bars. In this case, I'm using the preset grayscale colour scheme in R.

Annotating Bar Plots with Statistical Significance

It only took me a few hours to figure this out. If I'm using 1-way ANOVAs as a statistical test to determine differences in the means of endogenous cortisol concentrations between animals of different sex classes, I can annotate the resulting bar plot with letters denoting significant differences. For example, I'll be using this data set (stored in the variable examp):
   Sample.ID       Sex.Class     EndoF
1          1            Male 134.70700
2          2            Male 112.79506
3          3            Male  83.42866
4          4            Male  66.25100
5          5            Male 111.49403
6          6            Male 105.33600
7          7            Male  68.47000
8          8            Male  68.32500
9          9            Male 100.12880
10        10            Male  90.18200
11        11 Solitary Female  62.52132
12        12 Solitary Female  66.99200
13        13 Solitary Female  77.37900
14        14 Solitary Female  40.61669
15        15 Solitary Female  74.61191
16        16 Solitary Female  29.36257
17        17 Solitary Female  93.74400
18        18 Solitary Female  52.07800
19        19 Solitary Female  50.51210
20        20 Solitary Female  56.12120
21        21    Adult Female  77.55365
22        22    Adult Female  86.24100
23        23    Adult Female 110.45902
24        24    Adult Female  97.06417
25        25    Adult Female 142.45900
26        26    Adult Female 169.85475
27        27    Adult Female 175.39755
28        28    Adult Female 145.52400
29        29    Adult Female  99.12721
30        30    Adult Female 110.12300
I'll calculate and store the means and SEM as per my earlier post. To determine if there are significant differences in the means of EndoF between Sex.Class, I'll punch in the command:
examp.anova = aov(EndoF ~ Sex.Class, data = examp)
This will store the ANOVA data in examp.anova. The summary of the analysis shows:
            Df Sum Sq Mean Sq F value    Pr(>F)  
Sex.Class    2  18666  9333.0  13.502 8.621e-05 ***
Residuals   27  18663   691.2                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
This suggests that there are significant differences! To determine in which sex classes endogenous F are different, I'll run a Tukey pairwise comparison analysis.
examp.pairs <- glht(examp.anova, linfct = mcp(Sex.Class = "Tukey"))
The summary of which shows:
> summary(examp.pairs)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts

Fit: aov(formula = EndoF ~ Sex.Class, data = examp)
Linear Hypotheses:
                                    Estimate Std. Error t value Pr(>|t|)  
Male - Adult Female == 0              -27.27      11.76  -2.319    0.070 .
Solitary Female - Adult Female == 0   -60.99      11.76  -5.187   <0.001 ***
Solitary Female - Male == 0           -33.72      11.76  -2.868    0.021 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
So endogenous F in Solitary Females is significantly different than that in the other classes. I'll plot these results in bar plot form with SEM.
> examp.bp = barplot(examp.means, main = "Endogenous Cortisol by Sex Class", ylab = "Serum Total Cortisol (ng Cortisol/mL Serum)", xlab = "Sex Class", ylim = range(0,200), names.arg = order, axis.lty = 1)
> error.bar(examp.bp,examp.means,examp.sems)
To add letters displaying the statistical significance between groups, I'll just type in the commands:
letters = c("a","a","b") 
text(x=examp.bp,y=examp.means+examp.sems+10,label=letters)
This will add the letters manually to the bar plot 10 units above the SEM whiskers. There's a way to extract the pairwise comparison significance letters automatically using the cld() function.
examp.cld <- cld(examp.pairs)
This will store in examp.cld$mcletters$Letters the appropriate letters. However, they're stored in a different order than in examp.means and examp.sem. You just need to reorder either one. The final product: