EXAM STATISTICAL LEARNING spring 2008 Write a short report on the solutions to the two assigments below. The report should explain the final solutions, motivation why these are good solutions. Also give the computer code in an appendix. The data and supporting file are in the zip file on the website. ASSIGNMENT 1 Data in: States.txt Description data in: StatesData.pdf Background papers: gam.pdf, bootstrap.pdf Download the States data in States.text, e.g. in folder C:\\MasterMathProject and import them in R: states<-scan("C:\\MasterMathProject\\states.txt") states<-matrix(states,byrow=T,ncol=8) a) Fit a linear model (lm) predicting FAIL (Var8) from POPUL(Var1), INC(Var2) , LIFE (VAR4), HOM(Var5), SCHOOL(Var6) and FREEZE(Var7) N.B. The description of the states data in the StatesData.pdf applies only to the first seven variables. The outcome variable FAIL (the failure on an aptitude test) is the 8th variable in the StatesData.text file. b) Report the regression coefficients c) Report the squared correlation between FAIL and the fitted values d) Compute the squared correlation between the predictors and the fitted values e) Plot the partial residuals for each of the predictors and fit smooth spline functions through the scatter f) Fit a generalized additive model (gam). Fix the number of parameters k to 3 for POPUL and INC and to 4 for the other predictors. Report the fit. g) Use the bootstrap to test the stability. ASSIGNMENT 2 Data in: examdata.txt Background paper: svm-in-R.pdf The file examdata.txt contains 100 rows of 9 columns with the first column a binary classifier and columns 2 to 9 input data. The data are simulated. a) Apply principal component analysis and build a classifier based on the most important components. b) Build a support vector machine classifier. Use (at least) two kernels and determine the parameter by cross validation.