Statistical Genetics
Probability and statistics play an important role in
genetics. The mechanism of “meiosis”, the forming of sperm or egg
cells, is thought to be probabilistic in nature, as is the process of
mating in large populations. The relationship between “genotypes”
(DNA-sequence) and “phenotypes” (observable traits or diseases) can be
modelled by probability distributions. The analysis of genetic
determinants is based on random samples from a population, often
biased, and various statistical methods are necessary to analyse such
data. This course provides an introduction to stochastic models and
methods used in genetics, directed at students in mathematics. We do
assume a good working knowledge of probability and statistics (e.g.
likelihood and Bayes inference, asymptotics, testing), but do not
assume prior knowledge of genetics. In particular, the jargon in this
description will be explained.
Statistical genetics is a classical branch of applied
probability and statistics, which has recently gained much new
interest, due to the signicifant breakthroughs in genetics, both
experimentally and theoretically. With modern techniques and
significantly increased data it is hoped to link diseases and other
traits to genes (pieces of DNA) that can be precisely located on the
genome in an unprecedented manner. One could safely say that
this area is among the hottest in applied stochastics, and
in science in general. There are plenty of opportunities
for mathematicians interested in life sciences.
This course incorporates parts of
many different areas of statistics.
Of course we start with Mendel’s laws
of
“segregation”, which stipulate that each parent passes a randomly
chosen gene on to his/her offspring from each pair of genes,
independently across genes. The latter independence was later found out
to be untrue, and replaced by “linkage models”, which stipulate
positive dependence between genes sitting close together on the genome.
The most popular model is based on a Poisson process model for
“crossovers” during meiosis. The resulting models combined with
“penetrance models” (conditional distributions for phenotypes given
genotopyes) allow to write likelihoods for the observed phenotypes in
families (or “pedigrees”), and thus to estimate the dependence of
phenotypic traits on genetic factors. Because a full likelihood
analysis requires the specification of many probability densities and
is computationally intensive, other methods with the same aim are based
on reduced data, in particular “IBD” (identity by descent) status,
and/or on clever sampling plans.
“Association” studies are based on the idea that, under a random mating
assumption, a population should tend to equilibrium, with deviations in
pairs of genes (possibly) in a random sample of individuals indicating
that these genes are close together on the genome. Finally, “biometric
analysis” is directed at decomposing phenotypic variation into genetic
and environmental parts.
Lecturer:
Marianne Jonker (www.few.vu.nl/~majonker, m.a.jonker@vu.nl)
Required Knowledge:
Courses in probability and statistics from a Bachelor mathematics
program,
or equivalent. No knowledge of genetics required.
Meetings:
Friday afternoon: 13.30-16.15
Room: WN-P624 (week 36-42)
Room: WN-S607 (week 44-50)
Remark:
If you cannot make it to the first
meeting, send an e-mail to the lecturer (m.a.jonker@vu.nl), as the
course will become a reading
course if insufficient enrollment.
Credits: 6.
VU Vakcode: 400296.
Literature:
Lecture
Notes (our main text).
Peter Almgren, Par-Ola Bendahl,
Henrik Bengtsson, Ola Hossjer, Roland Perfekt: Statistics in
Genetics.
Downloadable from: Lund
University
Exercises:
There is no formal problem class. However, some exercises will be
discussed during the meetings on friday.
Lectures:
September 9:
chapter 1
biology,
Mather
homework: exercises 1,
2, 3 (week 1)
September
16:
Remainder of Chapter
1, chapter 2 up to page 25
homework: exercises 1,
2, 3, 4 (week 2)
September
23:
Chapter 2: HWE and LE
(no sections with an asterik)
Section 14.7: EM-algorithm
homework:
Try to understand section 2.2.2
Compute the maximum likelihood estimators at slide 24.
Do the calculations at slide 29.
September
30:
Chapter 3: Pedigree
Likelihoods
homework: exercises 1, 2
(chapter 3, week 4)
October 7:
Chapter 4: Identity by
Descent
Inheritance vectors
(sections 1.4, 3.6)
homework: exercise 2 of last
week
exercise 4.1 (at page 83)
check the values in Tables 4.1, 4.2 and 4.3
October 14: No lecture
October 21:
Chapter 5
Breast Cancer research (not in
lecture notes)
October 28:
Autumn break, no lecture
November 4: (Let op vanaf
nu: college in zaal WN-S607)
Chapter 6 (lecturer: Aad van der Vaart):
Section 14.6
Section 6.1 up to page 111
November 11: (Let op:
college in zaal WN-S607)
Chapter 6 (lecturer: Aad van der Vaart)
November 18:
Chapter 7
Homework: exercise 2 (not 2f) of exam
december
2006
exercise week 9
November 25:
Chapter 8
Homework: see lecture notes
December 2:
Chapter 9, section 9.1
Homework: see lecture notes
December 9:
Chapter 9, section 9.2
Homework: june 2005, exercises 3, 4
december 2006, exercise 1
Lund 1: 6, 8, 9, 10
Lund 2: 6
December 16:
Homework
Exam:
Written.
Date: December 21, time: 15.15-18.00 (always check the schedule
for changes)
Old Exams:
June 2005
December 2006
July 2007
To test yourself you might try the exams of Lund University Lund 1 and Lund 2
with picture .
|