Skip to content

Statistically Significant – But Not True

If you are looking for the Feature Engineering blog post, you can find it here: https://www.statistics.com/feature-engineering-data-prep-still-needed/

In 2015, at an Alzheimer’s conference, Biogen researchers presented dramatic brain scans showing that the antibody aducanumab effectively cleared out plaque in the brain, plaque that was associated with Alzheimer’s disease. Their study involved 166 patients in a randomized, controlled experiment. Moreover, patients in the treatment group showed slower cognitive decline than those in the placebo group. The study was big news:

“This is the first time an investigational drug for Alzheimer’s disease has demonstrated a statistically significant reduction on amyloid plaque as well as a statistically significant slowing of clinical impairment…”

Thus read the announcement from the senior vice president and chief medical officer at Biogen.

In partnership with Eisai, a Japanese pharmaceutical company, Biogen moved to a Phase III trial with 3200 patients to confirm these results.

Study Fails!

But several weeks ago, in a stunning development, the Biogen and Eisia announced that the trial’s Independent Data Monitoring Committee was recommending that the trial be stopped early – interim results showed that aducanumab was ineffective. (Statistics.com has a course on the roles of Data Monitoring Committees, and another on the flexible design of trials that allow early-stopping like this.)

The effect on Biogen and Eisia was immediate – Biogen’s shares dropped 29% and Eisia’s 35%.

Researchers in the field were surprised and disheartened. The evidence had seemed clear that amyloid plaque, a protein, was associated with Alzheimer’s, and the biological mechanism was also evident. From the NIH website:

“In the Alzheimer’s brain, abnormal levels of this naturally occurring protein clump together to form plaques that collect between neurons and disrupt cell function.”

And yet, despite the accumulated evidence of association between Alzheimer’s disease and amyloid plaque, a widely-shared understanding of the biological functioning of this relationship, the favorable results of a prior study, and the demonstrated effectiveness of aducanumab in clearing away plaque, in a large scale study aducanumab failed to slow the progress of Alzheimer’s. The study confirmed its effectiveness in clearing away plaque, but, nonetheless, no improvement in cognitive function ensued.

The research community is now speculating that the harm caused by the plaque buildup has already occurred before any cognitive symptoms occur. The sequence of events is a sobering reminder that the building and initial confirmation of a compelling scientific case is not the end of the story. Empirical evidence gets the last word.

How Did the First Study Go Wrong?

The first study, the Phase 1B one, correctly found that aducanumab cleared away plaque, but also reported favorable cognitive results that the much larger Phase III study did not find. What happened? It is useful to review the critical stages in the journey that a drug makes on its way to market.

After initial scientific discovery work and tests with animals, small Phase I studies are undertaken with humans to establish safety, and test different dosage levels.

After the Phase I results are in, and the investigators understand something about the dose-response relationship, a Phase II trial is undertaken to establish whether the drug is effective, and perhaps to continue research into side effects and dosing. Both Phase I and Phase II studies are conducted with patient numbers in the low 100’s (or even lower for some Phase I trials).

More expensive, multi-center Phase III studies are undertaken only once the Phase I and II hurdles are passed. Phase III’s purpose is to confirm clinical effectiveness.

Phase I studies are not intended to demonstrate effectiveness, and are not conducted with the same rigor on that point as the later Phase II and Phase III studies. Even at the time of the initial Phase I release, criticism was voiced that no claims regarding effectiveness should have been made, and the study should not have been published in the respected mass-circulation journal Nature. Why is this?

For one thing, the size of the Phase I study was tiny, or “underpowered,” less than 1/10 the size of the ultimate Phase III study. It was the appropriate size for testing safety and dose, but not for effectiveness. But, you might think, the result was statistically significant. Doesn’t that prove that the favorable cognitive outcome could not have arisen by chance?

It’s no guarantee. Recall that by this standard, 5% of all studies of ineffective compounds will come out with a false positive.

But not only was the Phase I study underpowered for establishing efficacy, it was also divided up into subgroups according to dose. And two different measures of cognition were used. And the patients were recruited at different times for different dosages and placebos. All this had the effect of widening the scope for extraneous variation from one subgroup to another, and of multiplying the number of potential comparison points while reducing the sample size for each, substantially increasing the scope for chance to play a role.

Journalists and the public, however, are ill suited to deal with these statistical subtleties, which are partly technical, but also partly philosophical, and meriting thoughtful consideration. Widespread publication of a study claiming “statistically significant” cognitive effects from an Alzheimer’s drug was bound to raise hopes and set expectations at an unwarranted level.