first prototype html generation, using unix hackery

cooplab · Oct 21, 2015 · df5dc0d · df5dc0d
1 parent 58cbf67
commit df5dc0d
Show file tree

Hide file tree

Showing 11 changed files with 1,028 additions and 3 deletions.
diff --git a/Makefile b/Makefile
@@ -1,15 +1,21 @@
+chapter_files := $(wildcard chapter*.tex)
+html_files := $(patsubst %.tex,html/%.html,$(chapter_files))
+
 all: popgen_notes.pdf
 
 %.png : %.eps
 	convert -density 300 $< -flatten $@
 
-PHONY: clean all
+PHONY: clean all site
 
 clean:
 	rm -f popgen_notes.pdf
+	rm -f html/*html
+
+site: $(html_files)
 
 popgen_notes.pdf: popgen_notes.tex
 	latexmk $<
 
-%.tex: html/%.html
-	pandoc -s --mathjax --smart --to html5 $< > $@
+html/%.html: %.tex
+	(cat template.tex; cat $<; echo "\\\\end{document}") | pandoc -s --mathjax --smart --to html5 --from latex > $@
diff --git a/html/chapter-01.html b/html/chapter-01.html
diff --git a/html/chapter-02.html b/html/chapter-02.html
diff --git a/html/chapter-03.html b/html/chapter-03.html
@@ -0,0 +1,62 @@
+<!DOCTYPE html>
+<html>
+<head>
+  <meta charset="utf-8">
+  <meta name="generator" content="pandoc">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
+  <title></title>
+  <style type="text/css">code{white-space: pre;}</style>
+  <!--[if lt IE 9]>
+    <script src="https://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+  <![endif]-->
+  <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>
+</head>
+<body>
+<h1 id="correlations-between-loci-linkage-disequilibrium-and-recombination">Correlations between loci, linkage disequilibrium, and recombination</h1>
+<p>Up to now we have been interested in correlations between alleles at the same locus, e.g. correlations within individuals (inbreeding) or between individuals (relatedness). We have seen how relatedness between parents affects the extent to which their offspring is inbred. We now turn to correlations between alleles at different loci. To understand correlations between loci we need to understand recombination.<br />
+</p>
+<h4 id="recombination">Recombination</h4>
+<p>Lets consider an individual heterozygous for a <span class="math inline">\(AB\)</span> and <span class="math inline">\(ab\)</span> haplotype. If no recombination occurs between our two loci in this individual, then these two haplotypes will be transmitted intact to the next generation. While if a recombination (or more generally an odd number of recombinations) occurs between our two loci on the haplotype transmitted to the child then <span class="math inline">\(\tfrac{1}{2}\)</span> the time the child receives a <span class="math inline">\(Ab\)</span> haplotype and <span class="math inline">\(\tfrac{1}{2}\)</span> the time the child receives a <span class="math inline">\(aB\)</span> haplotype. So recombination is breaking up the association between loci. We’ll define the recombination fraction (<span class="math inline">\(r\)</span>) to be the probability of an odd number of recombinations between our loci. In practice we’ll often be interested in relatively short regions where recombination is relatively rare, and so we might think that <span class="math inline">\(r=r_{BP}L \ll 1\)</span>, where <span class="math inline">\(r_{BP}\)</span> is the average recombination rate per base pair (typically <span class="math inline">\(\sim 10^{-8}\)</span>) and L is the number of base pairs separating our two loci.<br />
+</p>
+<h4 id="linkage-disequilibrium">Linkage disequilibrium</h4>
+<p>The (horrible) phrase linkage disequilibrium (LD) refers to the statistical non-independence (i.e. a correlation) of alleles at different loci. Our two loci, which segregate alleles <span class="math inline">\(A/a\)</span> and <span class="math inline">\(B/b\)</span>, have allele frequencies of <span class="math inline">\(p_A\)</span> and <span class="math inline">\(p_B\)</span> respectively. The frequency of the two locus haplotype is <span class="math inline">\(p_{AB}\)</span>, and likewise for our other three combinations. If our loci were statistically independent then <span class="math inline">\(p_{AB} = p_Ap_B\)</span>, otherwise <span class="math inline">\(p_{AB} \neq p_Ap_B\)</span> We can define a covariance between the <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> alleles at our two loci as <span class="math display">\[D_{AB} = p_{AB} - p_Ap_B\]</span> and likewise for our other combinations at our two loci (<span class="math inline">\(D_{Ab},~D_{aB},~D_{ab}\)</span>). These <span class="math inline">\(D\)</span> statistics are all closely related to each other as <span class="math inline">\(D_{AB} = - D_{Ab}\)</span> and so on. Thus we only need to specify one <span class="math inline">\(D_{AB}\)</span> to know them all, so we’ll drop the subscript and just refer to <span class="math inline">\(D\)</span>. Also a handy result is that we can rewrite our haplotype frequency <span class="math inline">\(p_{AB}\)</span> as <span class="math display">\[p_{AB} = p_Ap_B+D. \label{eqn:ABviaD}\]</span> If <span class="math inline">\(D=0\)</span> we’ll say the two loci are in linkage equilibrium, while if <span class="math inline">\(D&gt;0\)</span> or <span class="math inline">\(D&lt;0\)</span> we’ll say that the loci are in linkage disequilibrium (we’ll perhaps want to test whether <span class="math inline">\(D\)</span> is statistically different from <span class="math inline">\(0\)</span> before making this choice). You should be careful to keep the concepts of linkage and linkage disequilibrium separate in your mind. Genetic linkage refers to the linkage of multiple loci due to the fact that they are transmitted through meiosis together (most often because the loci are on the same chromosome). Linkage disequilibrium merely refers to the correlation between the alleles at different loci, this may in part be due to the genetic linkage of these loci but does not necessarily imply this (e.g. genetically unlinked loci can be in LD due to population structure).<br />
+Another common statistic for summarizing LD is <span class="math inline">\(r^2\)</span> which we write as <span class="math display">\[r^2 = \frac{D^2}{p_A(1-p_A) p_B(1-p_B) }\]</span> as <span class="math inline">\(D\)</span> is a covariance, and <span class="math inline">\(p_A(1-p_A) \)</span> is the variance of an allele drawn at random from locus <span class="math inline">\(A\)</span>, <span class="math inline">\(r^2\)</span> is the squared correlation coefficient.<br />
+<span><strong>Question.</strong></span> You genotype 2 bi-allelic loci (A &amp; B) segregating in two mouse subspecies (1 &amp; 2) which mate randomly among themselves, but have not historically interbreed since they speciated. On the basis of previous work you estimate that the two loci are separated by a recombination fraction of 0.1. The frequencies of haplotypes in each population are:</p>
+<table>
+<thead>
+<tr class="header">
+<th style="text-align: center;">Pop</th>
+<th style="text-align: center;"><span class="math inline">\(p_{AB}\)</span></th>
+<th style="text-align: center;"><span class="math inline">\(p_{Ab}\)</span></th>
+<th style="text-align: center;"><span class="math inline">\(p_{aB}\)</span></th>
+<th style="text-align: center;"><span class="math inline">\(p_{ab}\)</span></th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td style="text-align: center;">1</td>
+<td style="text-align: center;">.02</td>
+<td style="text-align: center;">.18</td>
+<td style="text-align: center;">.08</td>
+<td style="text-align: center;">.72</td>
+</tr>
+<tr class="even">
+<td style="text-align: center;">2</td>
+<td style="text-align: center;">.72</td>
+<td style="text-align: center;">.18</td>
+<td style="text-align: center;">.08</td>
+<td style="text-align: center;">.02</td>
+</tr>
+</tbody>
+</table>
+<p><span><strong>A)</strong></span> How much LD is there within populations, i.e. estimate D?<br />
+<span><strong>B)</strong></span> If we mixed the two populations together in equal proportions what value would D take before any mating has had the chance to occur?<br />
+</p>
+<h4 id="the-decay-of-ld-due-to-recombination">The decay of LD due to recombination</h4>
+<p>We will now examine what happens to LD over the generations if we only allow recombination to occur in a very large population (i.e. no genetic drift, i.e. the frequencies of our loci follow their expectations). To do so consider the frequency of our <span class="math inline">\(AB\)</span> haplotype in the next generation <span class="math inline">\(p_{AB}^{\prime}\)</span>. We lose a fraction <span class="math inline">\(r\)</span> of our <span class="math inline">\(AB\)</span> haplotypes to recombination ripping our alleles apart but gain a fraction <span class="math inline">\(rp_A p_B\)</span> per generation from other haplotypes recombining together to form <span class="math inline">\(AB\)</span> haplotypes. Thus in the next generation <span class="math display">\[p_{AB}^{\prime} = (1-r)p_{AB} + rp_Ap_B\]</span> this last term here is <span class="math inline">\(r(p_{AB}+p_{Ab})(p_{AB}+p_{aB})\)</span>, which multiplying this out is the probability of recombination in the different diploid genotypes that could generate a <span class="math inline">\(p_{AB}\)</span> haplotype.<br />
+We can then write the change in the frequency of the <span class="math inline">\(p_{AB}\)</span> haplotype as <span class="math display">\[\Delta p_{AB} = p_{AB}^{\prime} -p_{AB} = -r p_{AB} + rp_Ap_B = - r D\]</span> so recombination will cause a decrease in the frequency of <span class="math inline">\(p_{AB}\)</span> if there is an excess of <span class="math inline">\(AB\)</span> haplotypes within the population (<span class="math inline">\(D&gt;0\)</span>), and an increase if there is a deficit of <span class="math inline">\(AB\)</span> haplotypes within the population (<span class="math inline">\(D&lt;0\)</span>). Our LD in the next generation is <span class="math inline">\(D^{\prime} =
+p_{AB}^{\prime}\)</span>, so we can rewrite the above eqn. in terms of the <span class="math inline">\(D^{\prime} \)</span> <span class="math display">\[D^{\prime}= (1-r) D\]</span> so if the level of LD in generation <span class="math inline">\(0\)</span> is <span class="math inline">\(D_0\)</span> the level <span class="math inline">\(t\)</span> generations later (<span class="math inline">\(D_t\)</span>) is <span class="math display">\[D_t=  (1-r)^t D_0\]</span> so recombination is acting to decrease LD, and it does so geometrically at a rate given by <span class="math inline">\((1-r)\)</span>. If <span class="math inline">\(r \ll 1\)</span> then we can approximate this by an exponential and say that <span class="math display">\[D_t \approx  D_0 e^{-rt}\]</span><br />
+<span><strong>Q C)</strong></span> You find a hybrid population between the two mouse subspecies described in the question above, which appears to be comprised of equal proportions of ancestry from the two subspecies. You estimate LD between the two markers to be 0.0723. Assuming that this hybrid population is large and was formed by a single mixture event, can you estimate how long ago this population formed?<br />
+</p>
+</body>
+</html>