Blogging My Academic Life

Monday, 5 September 2011

Procrastination Busting

So with the copious amounts of work I need to do this month in mind, I decided to look into various strategies to increase my focus and decrease my ADD-fueled procrastination.

I'm a big fan of the Pomodoro technique, wherein you study/work for 25 minutes, then take a 5 minute break. I downloaded the Pomodoro app (http://pomodoro.ugolandini.com/), which seems to be a cool utility that times the intervals and takes statistics on your productivity and pledged to myself to use it. I used to use Focus Booster (http://www.focusboosterapp.com/) for timing purposes, but it was a little too bare bones for my liking.

One big time sink was reading my RSS feeds a couple times an hour. I decided to delete many feeds, especially the ones that were just spamming a lot of random, uninteresting crap (Engadget, I'm looking at you), and kept just a select few. I also unpinned and closed my Google Reader tab.

Finally, I downloaded a program called Isolator (http://willmore.eu/software/isolator/), which just obfuscates windows in the background. Apparently, it helps to maintain focus. I'll report back on its effectiveness.

Thursday, 25 August 2011

The Retraction Index

I find this oddly interesting: Is it time for a Retraction Index?

Friday, 19 August 2011

Biologists' discovery may force revision of biology textbooks

What a silly headline for a not-so-earth-shattering finding: http://www.physorg.com/news/2011-08-biologists-discovery-biology-textbooks.html. Technically, shouldn't textbooks be re-written after any new finding?

Sunday, 12 June 2011

Rib Rub Recipe

I haven't updated in a while, but that's more a function of me doing little else but hormone saturation assays for the past couple weeks. I did manage to invent a rub for chicken and ribs:

2 tablespoons garlic powder
1 tablespoon chili powder
1.5 tablespoons Sriracha sauce
1.5 tablespoons sesame oil
1 tablespoon salt
1 tablespoon pepper

It's pretty fantastic, if I say so myself.

Monday, 30 May 2011

Non-linear Regression

The rate at which a protein – with one binding site – binds to its ligand is related to the concentrations of the free and bound receptors, and to the concentration of the free ligand. The equation to describe these relationships is similar to the Michaelis-Menten equation, which describes enzyme kinematics. The binding kinetics equation is:

Where SB is defined as specific binding of the ligand to the protein, Bmax is the total protein concentration (or free and bound protein concentration), [L] is the free ligand concentration, and Kd is the dissociation rate constant (the ratio between the backward and forward reaction rate constants).

Usually, Kd is the one unknown variable in the equation that is of interest to me. To determine its value, I'll run a saturation binding assay in which I mix the protein with a dilution series of its ligand and measure the amount of protein-ligand complexes. The Kd can then be determined by Scatchard plotting or non-linear regression, the latter method of which is considered to be the gold standard.

I work with a protein called corticosteroid binding globulin (aka transcortin, CBG, and serpin peptidase inhibitor clade A member 6). One of its ligands in most mammals is cortisol, and the interplay between this protein-ligand pair plays significant roles in modulating the stress response, the immune system, and energy balance, amongst other systems. I use tritiated cortisol as a tracer molecule in a saturation binding assay to determine the binding kinematics of cortisol to CBG.

After running the assay, I get specific binding data in counts (of beta decays) per minute dependent on the free ligand concentration. An example data set is shown:

Dilution SB LA 1134 46.28B 798 22.48C 656 10.95D 493 5.24E 305 2.50F 198 1.25

A plot of the data (plot()) shows an exponential relationship between SB and [L].

In a Scatchard plot, I plot specific binding per free ligand concentration on the y-axis against specific binding on the x-axis. This transformation results in a roughly linear plot that is suitable for linear regression analysis.

I'll use the lm() command to run a linear regression and abline() to plot it on the above scatterplot. These are the results:

Call:lm(formula = sbl.518 ~ sb.518.3, data = dfl.518)
Residuals: 1 2 3 4 5 6 19.697 -17.888 -14.008 -3.400 -2.666 18.264
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 168.76344 15.59183 10.824 0.000413 ***sb.518.3 -0.14458 0.02312 -6.253 0.003335 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 17.72 on 4 degrees of freedomMultiple R-squared: 0.9072, Adjusted R-squared: 0.884 F-statistic: 39.1 on 1 and 4 DF, p-value: 0.003335

The absolute value of the slope of the line (the sb.518.3 coefficient) is inversely proportional to the Kd of CBG, or Kd = 1/0.1445 = 6.92. I can also calculate Bmax as the y-axis intercept (Intercept) divided by the absolute value of the slope. The adjusted R-squared value is not very high (0.884), so this suggests to me that this Kd is a ballpark estimate of the real Kd.

The other way to determine Kd is by non-linear regression. This method is considered to be 'better' than Scatchard analysis because I can define the equation of the line to be fitted to my data, rather than transforming my data to fit a linear line. This can be done in R using the nls() command:

nls.518 <- nls(SB ~ Bmax * L / (Kd + L), data = dfl.518, start = list(Bmax = 1167, Kd = 6.92)

So in the nls() command, I define the equation to be used to fit my data as SB ~ Bmax * L / (Kd + L), and I also define starting values for Bmax and Kd as the values I derived from my Scatchard analysis. R needs these values as the starting point to its analysis. My results are:

Formula: SB ~ Bmax * L / (Kd + L)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Bmax 982.2157 17.2532 56.93 1.19e-05 ***
kd 5.3959 0.2556 21.11 0.000233 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.391 on 3 degrees of freedom
Number of iterations to convergence: 3
Achieved convergence tolerance: 5.493e-06

So by using non-linear regression analysis, I've determined the Kd and Bmax to be 5.40 and 982.2, respectively, which are quite different than the values I obtained using Scatchard plotting. The p-values are highly significant, and the residuals are fantastic.

So what does it all mean? One practical interpretation of Kd is that at a ligand concentration equal to the Kd, half of the receptors are bound to its ligand. The lower the Kd, the higher the affinity of the receptor to its ligand, and vice versa.

Monday, 23 May 2011

More R

Another day, another round of new stuff learned about R. I've written a function that will calculate the means of a vector (y) aggregated by factor (z).

> avg = function(y,z) {

var1 = aggregate(y, by = list(z), FUN = mean, na.rm=T)

var2 = var1$x

}

The function returns a vector with the means. Likewise, I wrote a function for the calculation of standard error of the means (SEM):

> sem = function(y,z) {

var1 = aggregate(y, by = list(z), FUN = sd, na.rm=T)

var2 = aggregate(y, by = list(z), FUN = length)

var3 = var1$x / sqrt(var2$x)

}

So with these functions, I can easily calculate the means + SEM for a large dataset. For example:

> data.means = data.frame(avg(y1,z),avg(y2,z),avg(y3,z),avg(y4,z),avg(y5,z),avg(y6,z))

> data.sems = data.frame(sem(y1,z),sem(y2,z),sem(y3,z),sem(y4,z),sem(y5,z),sem(y6,z))

I can give the rows and columns appropriate headers like so:

> rownames(data.means) = c("Hibernation", "Post-hibernation", "Pre-hibernation")

> colnames(data.means) = c("endof", "dsb", "dmcbc", "dfreef", "csb", "cfreef", "cmcbc")

So the formatted data frame will look like so:

> data.means endof dsb dmcbc dfreef csb cfreef cmcbc
Hibernation 51.87500 49.06250 0.596250 22.851250 82.16667 11.875000 0.950000
Post-hibernation 33.86667 82.37778 1.233333 9.591111 85.81429 3.020000 1.335714
Pre-hibernation 13.60000 45.46250 0.551250 4.858750 84.30000 1.038333 1.058333

I want to rearrange the data so that Pre-hibernation comes first, then Hibernation and Post-hibernation. I'll first assign a variable with my desired order, then do the rearrangement.

> order = c("Pre-hibernation", "Hibernation", "Post-hibernation")

> data.means = data.means[order, , drop=F]

Now I want to plot all these data in a single bar plot. I'll have to convert the data frames into matrices because barplot() can't use data frames.

> data.means = data.matrix(data.means, rownames.force = T)

> data.sems = data.matrix(data.sems, rownames.force = T)

Now to plot the means, SEM, and add a legend to the top right of the plot:

> means.bp = barplot(data.means, main = "FreeF By State", ylab = "FreeF", xlab = "State", ylim = range(0,120), beside = T, names.arg = colnames(data.means), axis.lty = 1,col = gray.colors(3))

> error.bar(means.bp,data.means,data.sems)

> legend("topright", rownames(data.means), fill=gray.colors(3))

I've added the argument beside = T to the bar plot so that R will plot the data grouped into the columns present in the data matrix. So the arguments col and fill do practically the same thing in barplot() and legend(), respectively: designate the colour scheme of the bars. In this case, I'm using the preset grayscale colour scheme in R.

Annotating Bar Plots with Statistical Significance

It only took me a few hours to figure this out. If I'm using 1-way ANOVAs as a statistical test to determine differences in the means of endogenous cortisol concentrations between animals of different sex classes, I can annotate the resulting bar plot with letters denoting significant differences. For example, I'll be using this data set (stored in the variable examp):

Sample.ID Sex.Class EndoF
1 1 Male 134.70700
2 2 Male 112.79506
3 3 Male 83.42866
4 4 Male 66.25100
5 5 Male 111.49403
6 6 Male 105.33600
7 7 Male 68.47000
8 8 Male 68.32500
9 9 Male 100.12880
10 10 Male 90.18200
11 11 Solitary Female 62.52132
12 12 Solitary Female 66.99200
13 13 Solitary Female 77.37900
14 14 Solitary Female 40.61669
15 15 Solitary Female 74.61191
16 16 Solitary Female 29.36257
17 17 Solitary Female 93.74400
18 18 Solitary Female 52.07800
19 19 Solitary Female 50.51210
20 20 Solitary Female 56.12120
21 21 Adult Female 77.55365
22 22 Adult Female 86.24100
23 23 Adult Female 110.45902
24 24 Adult Female 97.06417
25 25 Adult Female 142.45900
26 26 Adult Female 169.85475
27 27 Adult Female 175.39755
28 28 Adult Female 145.52400
29 29 Adult Female 99.12721
30 30 Adult Female 110.12300

I'll calculate and store the means and SEM as per my earlier post. To determine if there are significant differences in the means of EndoF between Sex.Class, I'll punch in the command:

examp.anova = aov(EndoF ~ Sex.Class, data = examp)

This will store the ANOVA data in examp.anova. The summary of the analysis shows:

Df Sum Sq Mean Sq F value Pr(>F)
Sex.Class 2 18666 9333.0 13.502 8.621e-05 ***
Residuals 27 18663 691.2
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This suggests that there are significant differences! To determine in which sex classes endogenous F are different, I'll run a Tukey pairwise comparison analysis.

examp.pairs <- glht(examp.anova, linfct = mcp(Sex.Class = "Tukey"))

The summary of which shows:

> summary(examp.pairs)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts

Fit: aov(formula = EndoF ~ Sex.Class, data = examp)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
Male - Adult Female == 0 -27.27 11.76 -2.319 0.070 .
Solitary Female - Adult Female == 0 -60.99 11.76 -5.187 <0.001 ***
Solitary Female - Male == 0 -33.72 11.76 -2.868 0.021 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

So endogenous F in Solitary Females is significantly different than that in the other classes. I'll plot these results in bar plot form with SEM.

> examp.bp = barplot(examp.means, main = "Endogenous Cortisol by Sex Class", ylab = "Serum Total Cortisol (ng Cortisol/mL Serum)", xlab = "Sex Class", ylim = range(0,200), names.arg = order, axis.lty = 1)

> error.bar(examp.bp,examp.means,examp.sems)

To add letters displaying the statistical significance between groups, I'll just type in the commands:

letters = c("a","a","b")

text(x=examp.bp,y=examp.means+examp.sems+10,label=letters)

This will add the letters manually to the bar plot 10 units above the SEM whiskers. There's a way to extract the pairwise comparison significance letters automatically using the cld() function.

examp.cld <- cld(examp.pairs)

This will store in examp.cld$mcletters$Letters the appropriate letters. However, they're stored in a different order than in examp.means and examp.sem. You just need to reorder either one. The final product: