There are many moving parts in modeling the spread of an epidemic, a subject that has lately attracted the attention of great numbers of statistically-oriented non-epidemiologists (like me). I’ve put together a “lay statistician’s guide” to some of the important parameters and factors (and I welcome corrections/additions!).
Case fatality rate or CFR: Deaths as a percentage of confirmed cases.
Crude fatality rate: Deaths as a percentage of the population. (I have also seen the term “crude CFR” to mean CFR – case fatality rate – before any adjustment.)
Ascertainment bias: Upward bias in estimating the death rate because the denominator, the number of cases, does not include all actual disease cases, but only those confirmed by testing.
Right-censoring bias: Downward bias in estimating the death rate during the course of an epidemic due to the fact that not all deaths (the numerator) from the disease cases (the denominator) have occurred at the time of estimation. This is an issue particularly during the early growth phases of an epidemic.
Viral load: The extent to which the virus has infected a person (i.e. concentration of virus particles). Tests differ in their sensitivity to viral load, some tests need a higher viral load to register a positive result.
Factors to be taken into account in simulation models include:
Asymptomatic ratio: On the cruise ship Diamond Princess, the asymptomatic proportion of Coronavirus cases was roughly half; i.e. half the actual cases exhibited no symptoms. The disease is contagious during this period, so the higher this ratio, the more readily the disease spreads, since self-isolation is less likely during the asymptomatic phase.
Clustering: The disease does not spread evenly, but rather tends to cluster in spots where cases expand rapidly in a local area (e.g. the Wuhan seafood market, Washington State nursing home, Shicheonji church in South Korea, or the town of Bergamo in Italy). The rapid concentrated spread can overwhelm nearby medical facilities, leading to a higher fatality rate than would be the case in a slower or less concentrated spread.
Reproduction number, or R: The number of new cases triggered by each new case. The estimated R for coronavirus is between 2 and 2.5. By comparison, the R for seasonal influenza is about 1.25, and for pandemic influenza it is about 1.75 (see this study).
Testing: Testing availability can differ from country to country, and from one region to another, and this can significantly affect confirmed infection rates. There are different types of tests, with different timelines. Most tests currently in use require samples to be sent back to a lab. The “45 minute test” approved by the US Food and Drug Administration (FDA) last week is projected to be available later this month; it requires a special on-site machine (which could cost over $30,000) to read samples. This preprint claims that the processes for test development and testing itself can both be dramatically sped up by skipping the time-consuming step of RNA extraction.
Population age distribution: Different distributions of the population by age group can require that models adjust infection and death rates if information from one area is used to project outcomes in another. For example, one study took the age information from the Diamond Princess cruise ship (CFR 1% overall), applied it to China (CFR over 3.5% overall), to come up with an age-adjusted CFR of 0.5% for the general Chinese population – a 7-fold reduction.
This web-based simulator, defines a number of states/outcomes that an individual might pass through
- Susceptible (not immune)
- Removed (quarantined, immune)
It allows the user to modify parameters like those reviewed above, and see the effect on outcomes like infection and death rates over time via dynamic visualizations.