Empirical Processes and Statistical Learning

lecturer A.W. van der Vaart (at VU)
credits 8
period Spring 2011
schedule Wednesday 14.00-16.30, in weeks 6-21. First meeting: February 9. No meeting: Wednesday March 23 and Wednesday 20 April.
location VU University; weeks 6 - 11 and 13 - 20: WN-M639, weeks 12 and 21: WN-121 (WN=science building), Science Building Vrije Universiteit
exam Written, 1 June, 14-17 hours, M 623.
Retake, Wednesday 17 August, 14-17 hours.
See here for example questions.
registration Registration for the course via mastermath.nl is necessary. Grades will be registered via mastermath.
contents The empirical measure of a set of random variables is the discrete random measure that puts a point mass of size one divided by the sample size at each of the random variables. The expectation of some function under this measure is just an average, and under appropriate integrability this average will satisfy a law of large numbers (LLN) and, after centering and scaling, a central limit theorem (CLT). Empirical process theory studies these objects for many functions jointly, and is concerned with the LLN or CLT uniformly in classes of functions, as well as inequalities that measure the size of suprema of these objects over classes of functions. The empirical distribution function and the classical empirical process on the line are very special examples, for which the uniform LLN and CLT were obtained by Glivenko-Cantelli in the 1930s and Donsker in the 1940/50s, respectively. The Kolmogorov-Smirnov statistic for goodness-of-fit and its approximation by the maximum of a Brownian bridge process is one important application of these classical results. The general theory of empirical processess is more recent, and is based on Vapnik-Cervonenkis combinatorial theory and Kolmogorov entropy.
This theory has many applications in statistics. In this course we shall focus on its use to derive rates of estimation of nonparametric statistical procedures. In the terminology of computer science this is called statistical learning theory. For instance, one obtains a sample of instances (realizations of variables in some measurable space), each being classified as a 0 or 1, and one wants to build a procedure that can classify a future instance as a 0 or 1. Empirical risk minimization, support vector machines, and kernel learners are all methods to solve this problem, and can be studied using empirical process theory.
literature Chapters from Weak Convergence and Empirical Processes by Van der Vaart and Wellner (2nd edition). Excerpts will appear on this site.
required knowledge Measure-theoretic probability.

Preliminary course schedule

Week: SubjectMaterial
1IntroductionNone
2Maximal inequalities, entropy chap1.pdf, page 1-9.
3Maximal inequalities, entropy dictaat.pdf, page 10-19.
4Glivenko-Cantelli dictaat.pdf, page 20-25..
5Convergence in distribution in ell-infty dictaat.pdf, page 26-34.
6Donsker dictaat.pdf, page 34-40.
7 (March 30) Vapnik-Cervonenkis dictaat.pdf, page 41-50.
8Bracketing entropy dictaat.pdf, page 51-59.
9Rates of convergence dictaat.pdf, page 60-69 except 63,64.
10Rates of convergence dictaat.pdf, Chapter 8, until page 76.
11Rates of convergence and Concentration dictaat.pdf, remainder Chapter 8, Chapter 9.
12Model selection dictaat.pdf, Chapter 10.
13Model selection, support vector machines dictaat.pdf, Chapters 10, 11.