Introduction to Multilevel Models

This is for the short-course An Introduction to Multilevel Models Using Stata by Professor Kenneth Benoit, Trinity College Dublin. The course will be taught at the European University Institute from June 4-9, 2009.

Version: June 9, 2009.

European University Institute
June 4-9, 2009
Meets 11-13:00, with a lab Days 2-5 10-11:00

Objectives and Learning Outcomes

Multilevel models are a class of models that are used for data that are clustered into hierarchically organized groups, violating the conditional independence assumption about the error term from the classical linear regression model. For example, political data is frequently observed for individuals that are grouped by region or country, and there may be slightly different effects for individuals according to which region or country they are from. Multilevel models not only provide a means to correctly estimate causal models when data is hierarchically clustered, but also provide more direct ways to investigate the effects of different levels themselves on the causal process.

Prerequisites

Students in this course should already understand the linear regression model and basic concepts of statistical inference. Day 2 will provide a review of these topics, however.

Logistics

Meetings. Classes will meet five days for one two-hour sessions each day, from 11-13:00, from Thursday June 4, 2009 through Tuesday June 9, 2009, with a break on Sunday. In addition, on Days 2 through 5 a lab session will be held from 10:00-11:00 before the course.

Computer Software. Stata 10 will be used for this course. It would be an excellent idea to begin familiarizing yourselves with the Stata XT manual (Longitudinal/Panel Data).

Grading. The course is not graded although your participation will be recorded, and I will mark your problem sets so that you know how well you are doing in the exercises. Homework will be assigned on days 2-4, to be completed by the next day. A slightly larger final homework will be assigned on Day 5 (to be returned to me by email).

Datasets, code, and slides will be available from this web page.

Texts

A choice has been very deliberately made in this course to keep the reading material light in both quantity and difficulty level. We also chose to keep the methodological discussion of the models very close to the Stata implementation of these models. Consequently, most readings will be from a single text:

  • S. Rabe-Hesketh and A. Skrondal. Multilevel and Longitudinal Modeling Using Stata. Stata Press, 2nd edition, 2008.

It is also highly recommended that you spend some time Reading The Fine Manual (RTFM!) for the Stata commands we will use in this course. These are found in the Stata 10 XT manual and cover the following commands:

  • xtmixed Multilevel, mixed effects linear regression; see also the entry for xtmixed postestimation
  • xtreg Fixed-, between-, and random-effects linear
    models; see also the entry for xtreg postestimation
  • xtmelogit Multilevel, mixed effects logistic regression
  • xtmepoisson Multilevel, mixed effects Poisson regression

Although it will not be required in this course, you may also wish to download and read about the gllamm library (Generalized Linear Latent And Mixed Models) available from http://www.gllamm.org.

Additional recommended readings are listed at the end of this handout and linked to each day of the course. One excellent additional text for  general reference as well as a text for multi-level models is:

  • Gelman, Andrew and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press.

although this text is not required for the course. (All of the examples are implemented in R so it helps to be familiar with this package when reading the Gelman and Hill text.)

Schedule

Day 1: Introduction to multi-level data problems

Day 1 will start with a discussion of the problem of multi-level units, and introduce the Rabe-Hesketh and Skrondal notation using i and j to subscript levels. We will continue with a discussion of data organization for multilevel data, especially the “long” and “wide” formats, using examples. I will demonstrate simple models using OLS to show that data problems with multilevel structures can nonetheless be estimated using standard methods, and we will interpret and discuss these results to suggest why MLMs might be needed. I will also introduce the variance-components model and preview fixed- versus random-effects models.
Click here for the Stata code from Day 1.

Required Reading:
Rabe-Hesketh & Skrondal (2008, Chs. 1–2); Stata 10 manual for reshape.
Recommended Reading:
Franzese (2005); Gelman (2006); Austin, Goel & van Walraven (2001).
Homework:
Day 1 Homework here.

Day 2: Estimating models with multi-level data

Day 2 will start by revisiting the assumptions of the classical linear regression model, focusing on the assumptions that apply to the error term. A full coverage of the notation from R&S will be covered, and the full variance-components model will be used to decompose the variation in regression models where different hierarchical levels provide separate sources of variation. Intra-class correlation will be introduced and explained.
Click here for the Stata code from Day 2 – includes answers to Homework 1.

Required Reading:
Continue with Rabe-Hesketh & Skrondal (2008, Chs. 1–2); Stata XT manual, relevant commands (xtreg, xtmixed).
Recommended Reading:
Steenbergen & Jones (2002); Austin, Goel & van Walraven (2001); Snijders & Bosker (1999); Goldstein (2003).
Homework:

  1. Review Chapter 2 Rabe-Hesketh & Skrondal carefully, especially the notation for variance components models.
  2. Bring to class on Saturday an example of a dataset you might want some help reshaping (or collapsing, etc.) and will go through it.
  3. Be prepared to give a brief verbal synposis of your thesis project, since we will spend some time discussing these in class.

Day 3: Random-intercept models

The focus for Day 3 will be to continue on the presentation of the general multilevel model structure from Day 2 but to focus specifically on the random-intercept model with covariates as described in Chapter 3 of . The discussion centres on the disctinction between within-cluster and between-cluster covariate effects, and the problem of omitted cluster-level covariates and endogeneity.
Stata code from Day 3 – including lab session code.

Required Reading:
Rabe-Hesketh & Skrondal (2008, Ch. 3).
Recommended Reading:

Snijders & Bosker (1999).
Homework:

Rabe-Hesketh & Skrondal Exercise 3.1, 1 and 2; Rabe-Hesketh & Skrondal Exercise 3.2, all. Please submit the answers including Stata commands and output to me by email before Monday 10am.

Day 4: Random-coefficient models

Day 4 adds the possibility of random slopes to the previous models of random intercepts, so that the effects of covariates may differ across clusters.
Stata code from Day 4.

Required Reading:
Rabe-Hesketh & Skrondal (2008, Ch. 4)
Recommended Reading:
Snijders & Bosker (1999).
Homework:
Review random coefficients models using the code from Day 4.

Day 5: Extensions of the multi-level model

In this session we will wrap up the unfinished business with random coefficients models. This session will also cover mixed models, xtmelogit, xtmepoisson for binary and count data. It will also introduce, but not present comprehensively, multilevel models for longitudinal and panel-structured data.
Stata code from Day 5.

Required Reading:
Rabe-Hesketh & Skrondal (2008, Chs. 6, 9).
Recommended Reading:
McMahon & Heath (1992).

Bibliography

Austin, Peter C., Vivek Goel  Carl van Walraven. 2001. “An introduction to multilevel regression models.” Canadian Journal of Public Health 92(2, March-April):150-154.

Franzese, Robert. 2005. “Empirical Strategies for Various Manifestations of Multilevel Data.” Political Analysis 13(4):430-436.

Gelman, Andrew. 2006. “Multilevel modeling: what it can and can’t do.” Technometrics 48:241-251.

Gelman, Andrew  Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/ Hierarchical Models. Analytical Methods for Social Research Cambridge: Cambridge University Press.

Goldstein, Harvey. 2003. Multilevel Statistical Models. 3rd Edition. Oxford: Oxford University Press. A pdf of the 2nd edition is available on web from Professor Goldstein’s web page.

McMahon, Dorren  Anthony Heath. 1992. “Class and Party in Britain: Preliminary Results with a Multilevel Logit Model.” Multilevel Modeling Newsletter 4(3):5-8(3, November):5-8.

Rabe-Hesketh, Sophia  Anders Skrondal. 2008. Multilevel and Longitudinal Modeling Using Stata. 2nd ed. Stata Press.

Snijders, T.A.B.  R.J. Bosker. 1999. Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage.

Stata Corp. 2007. [XT] Longitudinal/Panel Data: Release 10. Stata Press.

Steenbergen, Marco R.  Bradford S. Jones. 2002. “Modeling Multilevel Data Structures.” American Journal of Political Science 46(1):218-237.