| Credits |
3 ECTS |
| Code |
405060 |
| Audience |
Master AI, CS, Information Science |
| Lecturer |
Prof.dr. Aad van der Vaart
(Faculty of Sciences, VU). |
| E-mail: aad at few.vu.nl |
| Period |
7, 14, 21, 28 September, 5, 12 October 2010. |
| Hours |
Tuesdays 11-12.45 |
|
WN - P647 |
| Aim |
An introduction to the design and statistical
analysis of experiments. |
| Form |
Lectures, computer assignments, final project. |
| Description |
An outcome may depend on one or more factors that can be
manipulated. For instance, time spent on a website depends on several
aspects of web design; proximity to target value is a function of
the parameters of an evolutionary algorithm, or (a classic in this area!)
agricultural yield depends on type of fertilizer and crop variety. In an
experiment one may measure the outcome for several settings of the
factors. If the measured outcomes are subject to chance variation,
statistical techniques must be used to analyse the results.
This course discusses commonly used techniques by way of examples,
using a minimum of mathematical formulas.
Data-sets are analysed using the statistical package R, and
emphasis is on correctly implementing statistical tests and
interpreting the computer output. Among the topics that will be addressed
are: |
| Recap of basic statistical concepts.
Population distribution, histogram,
QQ-plot, sample, statistical test, p-value. |
| Introduction to R. Basics of the open
source computer package R, and its application to ANOVA. |
| Analysis of variance.
One-way and two-way completely randomized designs, randomized block design,
regression, ancova. |
| Literature |
- Slides: One ,
Two (with data files
melon.txt and pvc.txt),
Three (with data files
sat.txt and fiber.txt),
Four (with data files
ashina.txt, ashinalong.txt
and penicillin.txt),
Five, (with data files
wheat.txt, and eshoph.txt).
-
Lecture notes by Geurt Jongbloed .
- R-manuals: a shorter one (in Dutch),
and the more extensive official one.
(See the R website for more.)
- Further reading (not required):
- Linear Models with R (chapters 13-16)
and Extending the Linear Model with R (chapters 2-3), by Julian Faraway
- Introductory Statistics with R (chapters 3-6 and 9-11),
by Peter Dalgaard.
|
| Assessment |
The average mark for the 5 assignments accounts for
60 % of the final mark, and the final project for the remaining
40 %. The marks for both parts have to be sufficient.
Assignments and final project
are typically carried out by groups of two students. A final individual,
oral discussion may be part of the examination. |
| Requirements |
Introductory course to statistics at the bachelor's level
for computer scientists. |
| Computer language |
The statistical package R can be downloaded from the R-project
site www.r-project.org .
It is free! It is also installed on the FEW computers. |
| Assignments |
Assignments 1,2,3,4,5 are due before the beginning of
lectures 2,3,4,5,6, which include a discussion of their solution.
The assignments concern material presented in lectures 1,2,3,4,5.
|
| Final project |
The course must be concluded by designing a small
experiment, analyzing its results, and writing a short report
(e.g. 5 pages). Students must formulate their own research question,
and gather their own data, by carrying out a (small) experiment,
using data from another course, or data from a third source. Think
carefully whether the data can answer the question, and which
analysis method applies. You may wish to send a short proposal
to the lecturer for comments before starting the experiment.
The deadline is at the beginning of the next lecture block. |