# Modeling Count Data

This course will teach you regression models for count data, models with a response or dependent variable data in the form of a count or rate, Poisson regression, the foundation for modeling counts, and extensions and modifications to the basic model.

## Overview

This course deals with regression models for count data; i.e. models with a response or dependent variable data in the form of a count or rate. A count is understood as the number of times an event occurs; a rate as how many events occur within a specific area or time interval. The course will cover the nature of various count models, problems of over- and under-dispersion, fit and residual tests, and graphics for count models. It also looks at advanced count models and an overview of Bayesian count models.

## Learning Outcomes

Students who complete this course will start with the fundamentals of modeling counts and move on to explore assessment of fit, alternative count models, and more advanced count models. They will study a broad range of topics designed to help them understand key model assumptions, how to select appropriate models and how to interpret model outcomes.

• Fit Poisson models to count data
• Interpret coefficients and rates
• Test for and deal with overdispersion
• Fit alternate models for count data – negative binomial and variants
• Model underdispersion

## Who Should Take This Course

Analysts and researchers in a wide variety of fields who are concerned with modeling counts and rates.

## Course Syllabus

### Week 1

Fundamentals of Modeling Counts; Poisson Regression

• What are counts
• Understanding a statistical count model
• Variety of count models
• Estimation – the modeling process
• Poisson model assumptions
• Apparent overdispersion
• The basic Poisson mode
• Interpreting coefficients and rate ratios
• Exposure; modeling time, area, and space
• Prediction
• Poisson marginal effects

### Week 2

Overdispersion, Assessment of Fit, and Negative Binomial Regression

• Count model fit statistics
• Overdispersion: what, why, and how
• Testing overdispersion
• Methods for handling overdispersion – adjusting SEs
• Analysis of residuals
• Likelihood ratio tests
• Model selection criterion
• Validation sample
• Varieties of negative binomial models
• Negative binomial model assumptions
• Examples using real data

### Week 3

Alternative Count Models: NB Fit Tests, PIG, Problem with Zeros

• General negative binomial fit tests
• Generalized NB-P regression (NBP)
• Heterogeneous negative binomial  (NBH)
• Generalized Poisson – modeling underdispersion (GP)
• Poisson inverse Gaussian (PIG)
• Zero-truncated count models
• Two-part hurdle models
• Zero-inflated count models

### Week 4

Underdispersed Count Data, Advanced Count Models

• Generalized Poisson – modeling underdispersion
• Exact Poisson regression
• Truncation and censored count models
• Finite mixture models
• Non-parametric and quantile count models
• Overview of longitudinal and clustered count models
• 3-parameter count models
• Overview of Bayesian count models
• Project preparation

#### Organization of Course

This course takes place online at The Institute for 4 weeks. During each course week, you participate at times of your own choosing – there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor.

At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials, work through exercises, and submit answers. Discussion among participants is encouraged. The instructor will provide answers and comments, and at the end of the week, you will receive individual feedback on your homework answers.

#### Time Requirements

This is a 4-week course requiring 10-15 hours per week of review and study, at times of your choosing.

#### Homework

Homework in this course consists of short answer questions to test concepts, guided data analysis problems using software, guided data modeling problems using software and end of course data modeling project. In addition to assigned readings, this course also has supplemental readings available online in the course.

#### Course Text

The required text is Modeling Count Data, Hilbe, Joseph M (2014), Cambridge University Press. This paperback edition includes R, Stata, SAS and Excel/CVS code, which can be downloaded from the author’s website. R data and functions are located in the COUNT package on CRAN. An electronic version of the book is also available from the publisher, or on Amazon.

#### Software

The methods covered in this course are handled well by Stata, R and for the most part, SAS.  Data sets used in the text are available in Stata, R SAS and Excel formats. With respect to code and output:

Stata
Code and output are provided for all examples for which known Stata commands exist.

R
Functions and scripts are available in the COUNT and msme packages.

SAS
Some code and output is provided, e.g., chapter 15 on Bayesian count models.

The instructor and TA are familiar with Stata and R. The instructor is familiar with most SAS procedures related to the modeling of count data. No instructional support is available for SAS.  If you plan on using R and are not already familiar with it, please consider taking one of our courses where R is introduced from the ground up:  R-Programming: Introduction,” “Introduction to R: Data Handling,” or “Introduction to R: Statistical Analysis.” R has a learning curve that is steeper than that of most commercial statistical software.

