logo.gif The source for online courses
in statistics
 ÖÐÎÄ Course Login
Home > Our Courses >



Missing Data Analysis

Dr. Geert Molenberghs

Aim of Course:

Conventional methods for handling missing data, like complete case analysis, single imputation, and last observation carried forward, waste data, sacrifice power, and can yield biased estimation and unreliable inferences. Much better results can be obtained with the newer but still established methods of direct maximum likelihood, direct Bayesian analysis, inverse probability weighting, and/or multiple imputation, which have become practical in the last few years with the introduction of widely available and user-friendly software. They are broadly valid under the so-called assumption of 'missing at random' (MAR). They apply to continuous data, binary data, categorical data, count data, etc. Furthermore, they are applicable throughout all areas of application, whether in biomedical sciences, economy, psychology, social and behavioral sciences, agriculture, biology, etc. The course will address the issues arising with the conventional methods, and provide a basis for the more promising methods, with focus on maximum likelihood, inverse probability weighting, and multiple imputation. A formal basis will be provided without being overly mathematical. Furthermore, case studies will be discussed and software implementation will be discussed. The issues arising when the MAR assumption is not met are sketched, together with the need for sensitivity analysis.

Who Should Take This Course:

Any statistical analyst who works with multivariate, longitudinal, or otherwise hierarchical data is likely to encounter missing observations and will benefit from this course.

For those enrolled in a Program of Advanced Statistical Studies, this is a required or elective course in the following Programs:

  • Biostatistics (epidemiology) - elective
  • Biostatistics (controlled trials) - elective
  • Data Mining - elective
  • Statistics for Social Sciences - elective
  • Statistics for Environmental Science - elective

Course Program:

The course is structured as follows:

SESSION 1: Setting the Scene
  • Review of models for continuous hierarchical data
  • Missing-data patterns (monotone, non-monotone)
  • Modeling frameworks (selection models, pattern-mixture models, shared-parameter models)
  • Missing-data mechanisms (missing completely at random, missing at random, missing not at random)
  • The failure of simple methods
SESSION 2: Direct Likelihood Methods
  • Inferential paradigms (likelihood, Bayesian, frequentist)
  • Ignorability
  • The principle for direct likelihood
  • Case studies
  • Software implementation
SESSION 3: Multiple Imputation
  • Rationale for multiple imputation
  • Principles underlying multiple imputation
  • Proper imputation
  • Case studies
  • Software implementation
SESSION 4: Inverse Probability Weighting
  • Review of models for non-continuous hierarchical data
  • Rationale for inverse probability weighting
  • Weighted generalized estimating equations
  • Case studies
  • Software implementation
  • Comments on methods for missing not at random
  • Comments on sensitivity analysis

The Instructor:

Dr. Geert Molenberghs is Professor of Biostatistics at Universiteit Hasselt and Katholieke Universiteit Leuven in Belgium. He received the B.S. degree in mathematics (1988) and a Ph.D. in biostatistics (1993) from Universiteit Antwerpen. He published on surrogate markers in clinical trials, and on categorical, longitudinal, and incomplete data. He was Joint Editor of Applied Statistics (2001-2004) and Co-Editor of Biometrics (2007-2009). He was President of the International Biometric Society (2004-2005), received the Guy Medal in Bronze from the Royal Statistical Society and the Myrto Lefkopoulou award from the Harvard School of Public Health. Geert Molenberghs is founding director of the Center for Statistics. He is also the director of the Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat). Jointly with Geert Verbeke, Mike Kenward, Tomasz Burzykowski, Marc Buyse, and Marc Aerts, he authored books on longitudinal and incomplete data, and on surrogate marker evaluation.

Organization of the Course:

The course takes place over the internet, at statistics.com. During each course week, you participate at times of your own choosing - there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor. The course is scheduled to take place over 4 weeks, and typically requires 15 hours per week. At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials and work through exercises. Discussion among participants is encouraged. The instructor will provide answers and comments.

Certificates and Grades:

You may be interested only in learning the material presented, and not be concerned with grades or certificates. Or you may be enrolled in a statistics.com Program in Advanced Statistical Studies that requires demonstration of proficiency in the subject, in which case your work will be assessed for purposes of issuing a grade. Or you may require only a "Certificate of Course Completion," along with professional development credit in the form of Continuing Education Units (CEU's). As you begin the class, you will be asked to specify your category.

Credit:

This course offers continuing education units (CEU's). For those successfully completing the course (generally this means marks of 50% or better on the homework), 5.0 CEU's and a certificate will be issued by statistics.com, upon request.

Dates:

Nov. 19 - Dec. 17, 2010
Click here to be notified of future course offerings.

Participants gain access to the online materials on the first day of the course, and typically spend about 15 hours per week (at their convenience). You retain full access to course materials, including discussion board, for two weeks after the course closing date.

Level:

intermediate/advanced

Prerequisite:

To take this course, you should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. But you do not need to know matrix algebra, calculus, or likelihood theory.

Course Text:

The required text for this course is Missing Data in Clinical Studies by Geert Molenberghs and Michael Kenward (2007) published by John Wiley & Sons. It can be ordered directly from the publisher by clicking here. Wiley typically offers statistics.com customers up to 15% discount on this book (and all other statistics titles): enter the code aff15 in the Promotion Code field when prompted during checkout and click the Apply Discount button. (If you are located in Asia, the web procedure for your location may not accept this discount -- try calling your regional Wiley representative.)

Software:

Hands-on computer assignments are a part of the course. SAS, Stata and R are suitable programs for doing these assignments; the instructor is familiar with SAS and can offer advice; more limited help is available from the TA's for SPSS and R.

Registration:

Register Online - $499
Register Online (academic) - $399 (you must be affiliated with a college, university or high school)

Add $50 service fee if you require a prior invoice, or if you need to submit a purchase order or voucher, pay by wire transfer or EFT, or refund and reprocess a prior payment. Please use this printed registration form, for these and other special orders.

Note: Courses may fill up at any time and registrations are processed in the order in which they are received. Your registration will be confirmed for the first available course date, unless you specify otherwise.