Prospective vs. Retrospective

A prospective study is one that identifies a scientific (usually medical) problem to be studied, specifies a study design protocol (e.g. what you’re measuring, who you’re measuring, how many subjects, etc.), and then gathers data in the future in accordance with the design. The definition of the problem under study does not change once the data collection starts.

A retrospective study is one in which you look backwards at data that have already been collected or generated, to answer a scientific (usually medical) problem.

Prospective studies are generally regarded as cleaner and more reliable, due to several shortcomings of retrospective studies:

Since the data are already available, the question to be answered can be influenced by the data
Similarly, the exact data (or subset of data) used to answer the question can be drawn in a way that produces the desired answer, or a more noteworthy or interesting answer
There is a temptation to modify the question being studied as the data are examined

Some prospective studies have retrospective aspects. Cohort and panel studies, which often follow a group over time, often collect information about the past. For example, participants may be asked if they had any medical conditions as a child, e.g. exposure to severe sunburn. However, in such cases, there is also an aspect of tracking events into the future.

A good example is a study following cigarette smokers over time, to see what medical conditions they contract during the study. This information would typically be compared to similar information from non-smokers. Although prior information from the participants might be used, the primary focus is what happens to each group during the period of the study. Sometimes called a “prospective cohort study,” these studies often fall short of the “gold standard” of a randomized trial, because the assignment to the treatment or control group is usually not random. One could not assign a person randomly to be a lifetime cigarette smoker or non-smoker. However, the outcome of interest (e.g. lung cancer) is something that develops or becomes apparent during the course of the study. Nobody is being selected on that basis, so the opportunity for selection bias is limited.

Another type of study is fully retrospective: the groups are selected on the basis of the outcome of interest. For example, one might look at a group of lung cancer patients and compare them to a group of patients without lung cancer to examine the prevalence of smoking in each group. The conclusions from such a study may not be as solid as those from prospective studies. You can demonstrate correlation, but external factors and aspects outside the scope of the study have a greater opportunity to creep in and bias the results (see selection bias) than with a prospective study where you identify groups beforehand and then observe what happens to them. In addition to bias, there can be problems with data quality. Some records may be missing. Subjects’ recall may be faulty. Plus, you don’t have the opportunity to collect data tailored to the needs of the study that you would with a prospective study. It may, as a result, be more difficult to make the leap from correlation to causation with a retrospective study than a prospective study.

The long process of proving a relationship between cigarette smoking and lung cancer is a good case study. Rates of smoking increased dramatically in the U.S. in the 1940’s and 1950’s, as cigarettes became more widely available and advertising glamorized their use. The incidence of lung cancer (which had been at relatively low rates prior to the wars) also increased. However, strong evidence of a link between the two was lacking.

This began to change in the 1950s. Five larger retrospective studies were published in the early 1950’s that again showed a link between cigarette smoking and lung cancer. Though important, these studies still didn’t make a convincing enough case as they relied on the self-reported smoking habits of people who already had lung cancer, and compared them to those who didn’t. One potential problem with this type of study is that people with lung cancer are more likely to overestimate how much they smoked, while those who don’t have lung cancer are more likely to underestimate how much they smoked.

To address this issue, a prospective (cohort) study was needed – recruiting healthy people and following them over time to see who develops or dies from lung cancer and who does not. Without such evidence, the tobacco industry was able to cast doubt on the link between smoking and death from lung cancer and other diseases, says Eric Jacobs, Ph.D., an epidemiologist at the American Cancer Society.

It was not until 1952 that Cuyler Hammond and Daniel Horn organized a prospective study with 22,000 volunteers that solid evidence of the link between cancer and smoking was established. Ironically, in a photo of the two researchers presenting their results at a meeting of the 1954 American Medical Association, they are both shown smoking pipes!

You can learn more about different study designs in these Statistics.com courses:

Designing Valid Statistical Studies: Learn about different study designs (randomized trial, observational study, case-control), and sources of bias
Epidemiologic Statistics: Learn about disease metrics (risk, morbidity, prevalence, …) and statistical approaches to measurement
Introduction to Statistical Issues in Clinical Trials: Learn about randomized clinical trials (RCTs), principles of design, including power, and statistical approaches for different study metrics (also called endpoints)
Introduction to Statistics: Learn the basic concepts of inference (p-values, confidence intervals)

Read more about the smoking studies in Elizabeth Mendes’ report at the American Cancer Society (https://www.cancer.org/latest-news/the-study-that-helped-spur-the-us-stop-smoking-movement.html ).