Data Mining and Statistical Learning

2015, Trinity College, Dublin, Department of Political Science

Instructor: Prof Kenneth Benoit, LSE

Details: Class meets MONDAYS in Feb-March from 14:00 – 16:30, with one exception on Day 2 (see below)

Rooms: See specific dates below.

Note: As the class proceeds, I will add resources (slides, R code, text datasets, problem sets) to each session below.

Main Texts:

  • James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer Science & Business Media.
  • Lantz, Brett. 2013. Machine Learning with R. Packt Publishing Ltd.
  • Zumel, N., & Mount, J. 2014. Practical data science with R. Shelter Island, NY: Manning.

Detailed Schedule

Day 1 Working with data and data structures
(Mon 9 Feb, 14:00-16:30, Room 201 Pheonix House)

Day 2 Rethinking regression as a predictive tool
(Wed 25 Mar, 10:00-12:30, Arts Block 3025)

  • Revisiting prediction for the classical regression model, including logistic regression.  Prediction v. association and causation.
  • Required Readings
    • James et al, Chs 3-4
    • Lantz, Ch. 6
  • Recommended readings:
    • Conway, Drew, and John White. 2012. Machine Learning for Hackers. O’Reilly. Chapter 5, “Regression: Predicting Page Views”.
    • Zumel and Mount, Ch. 7
  • Exercises: None, due to the short week, prediction methods will be rolled into the exercise for week 3.

Day 3 Introduction to machine learning
(Mon 2 Mar, 14:00-16:30, 206 Pheonix House)

Day 4 Shrinkage methods
(Mon 9 Mar, 14:00-16:30,206 Pheonix House)

  • Ridge regression, the Lasso.
  • Readings:
    • James et al, Ch 6
  • Recommended readings:
    • Conway, Drew, and John White. 2012. Machine Learning for Hackers. O’Reilly. Chapter 6, “Regularization: Text Regression”.
  • Exercise 3:
    1. Using the dail2002.dta dataset, select a random subset of 80% of the candidates, and then stepwise methods to discover the or a model that maximizes the variation explained in this training dataset. Then predict the fit to the 20% that you left out, and report the RMSE.
    2. Following the worked examples from James et al Ch. 6, do Problem 9 from p263 using the College dataset. You can get this from the “ISLR” package.

Day 5 Unsupervised learning
(Mon 16 Mar, 14:00-16:30,Aras an Phiarsigh Room 2.04)

  • Principal components, clustering methods.
  • Readings:
    • review the last part of James et al, Ch 6 on principal components regression
    • James et al Ch 10
  • Recommended readings:
    • Bond, Robert, and Solomon Messing. 2015. “Quantifying Social Media’s Political Space: Estimating Ideology From Publicly Revealed Preferences on Facebook.” American Political Science Review 109(01): 62–78.
    • Weller, Susan C, and A Kimball Romney. 1990. Metric Scaling: Correspondence Analysis. Sage.

Day 6 Working with text
(Mon 23 Mar, 14:00-16:30, Room 201 Pheonix House)


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

AddThis Social Bookmark Button

Leave a Reply