Skip to content

Statistics at War

World War 2 gave the statistics profession its big growth spurt. Statistical methods such as correlation, regression, ANOVA, and significance testing were all worked out previously, but it was the war which brought large numbers of people to the field as a profession. They didn’t necessarily start as statisticians.  George Box,

“All models are wrong, some are useful”

was working as a chemist in a sewer plant when war broke out.

Mustard Gas and ANOVA

After joining the army, Sergeant Box was assigned to the unit that was testing the effects of mustard gas. The investigators dropped small quantities of mustard gas on six locations on volunteers’ arms, then used various ointments to treat the spots. Box suggested that they needed a statistician to distinguish treatment-variation from variation between-subject variation. He had learned about this from a book he’d read:  R.A. Fisher’s Design of Experiments. Sergeant Box was told that, since he had read the book, he could do that statistical work.

Fisher himself was not called upon for work during the war, it was thought that he had fascist leanings, according to this article in the Economist.  (You may recall from this blog that Fisher was an enthusiastic promoter of eugenics and the need for the upper classes to out-breed the lower classes.)

And the best skin treatment for mustard gas exposure?  Nothing – leave it alone.

Industrial production and acceptance sampling

World War 2 saw a rapid expansion in industrial production, involving a massive infusion of labor that was, if not unskilled, at least unfamiliar with production processes.  Production rose, along with defects. The standard quality control system before the war was “inspect everything or nothing.”  It was never optimal, and the war proved it infeasible.  The resources to inspect everything were simply not available. Plus, in some cases, to test something meant to destroy it.  Clearly, an inspection protocol based on sampling was needed.

Harold Dodge elaborated and codified industrial quality control procedures based on sampling.  Rather than inspect entire production lots, a much smaller sample would be selected from each lot, and an accept/reject decision for the entire lot made on the basis of that sample. Acceptance sampling was an important component of the early quality control movement, but not the only one.  Acceptance sampling improved decision-making with respect to the disposition of individual units, but it was left to other statistical procedures — such as Statistical Process Control (SPC) and Design of Experiments (DOE) — to deal with the issue of improving the production process itself.

Two Short Probability Problems – Convoys and Bombers

Two consequential applications of statistics in wartime were actually solutions to fairly simple (or at least quick) probability problems.

Convoys had begun in World War 1 as a way to concentrate scarce military escort vessels in sufficient strength to protect cargo ships.  Interestingly, though, the rationale for convoys relies even more heavily on simple probability calculations. The probability of being detected is greater for a convoy versus a single ship, and a single 50-ship convoy, if attacked, has a bigger expected loss than a single ship.  However, both these factors are outweighed by the greater losses that 50 ships face sailing independently, compared to sailing together in a convoy. See this blog for the interesting probability calculations.

Convoys and Bombers at world war 2

Bomber pilots had the most hazardous duty in World War 2. The UK command in charge of night missions saw fewer than 50% of its pilots return by war’s end. Bombers, relatively slow and unmaneuverable due to their heavy payloads, had a hard time escaping attacks by fighter aircraft.  U.S. authorities, who had focused on reinforcing bomber areas that showed heavy damage (thus making aircraft even heavier), brought in an analytics team to give advice. Abraham Wald, who went on to comparative fame in statistics and operations research, convinced the Air Force to focus on reinforcing aircraft areas that consistently showed no damage. He reasoned that damage in those areas must be fatal, so the only bombers returning were those undamaged in those areas.

Bits and Pixels

In the current day, the main contribution of statisticians and data scientists is on the intelligence and electronic warfare fronts. This involves not just data gathering and analysis, but also offensive operations: the New York Times reported in 2012 that the Obama administration was expanding the successful Stuxnet cyber attacks against the Iranian nuclear program. The scale of data-dependent surveillance, intelligence, analysis, and cyber-warfare is massive: The U.S. National Security Agency (NSA) is one of the biggest employers (if not the biggest) of mathematicians and data scientists.        

The origins of the NSA date from World War 1, when the U.S. Army set up the Cipher Bureau to intercept and decode enemy diplomatic communications. A fair amount is known of these early days of code-breaking: When funding for the Bureau was withdrawn in the 1920’s, the ex-director, irate over the loss of his agency, wrote a tell-all book, The American Black Chamber. You can read about it here.  Note the sophisticated system by which it was declassified:


The NSA received its name in 1952 and has grown by leaps and bounds.  Satellites greatly expanded the scale of communications as well as the ability to intercept them.  The advent of computers boosted processing speed and volume and facilitated brute-force code-breaking. The coming of personal computers and the internet triggered a huge expansion of the scale of electronic communications, and the mobile phone reinforced this growth and broadened its character with the widespread transmission of images.  The NSA, of course, had already developed image-processing capability to deal with the photos generated by reconnaissance.  Social media further widened the scope and scale of what the NSA had to deal with.

When you realize that any one of these sources – Twitter, Facebook, ATT, Verizon, satellite imagery, Google Earth, etc. – deals itself with its own Big Data challenge and that the scope of the NSA’s possible writ includes them all, you realize the massive scale of the data challenge facing the agency.

Deep Learning at the NSA

There are at least three important dimensions to the data problem facing the NSA:

  • Scale: the potential size of the data that the NSA uses dwarfs that of any other single company or organization
  • Format and structure:  the data might be in the form of text, numeric data, images, video, or audio
  • Context:  most of the data lacks labels

Deep learning (DL) offers a way to approach all of these, particularly the latter two.  Deep neural networks are able to learn structural features of the data (e.g. image versus text) as well as context-related features (labels).  The requirements are twofold:

  • lots of data
  • lots of computing power

Getting enough data is not a problem; if anything, the NSA has too much. The computing resources available to the NSA are classified, but their big data center in Utah alone requires more electric power than the entire city of Charlottesville, VA.  And that was in 2013.

The NSA, despite its secretive nature, publishes a technical journal The Next Wave. In a recent special issue, the journal highlighted deep learning technologies, including a discussion of how DL helped the NSA deal with what the article termed multi-modal data (text, images, audio, video).  The articles in The Next Wave are necessarily at a high level; you are not going to see cool practical case studies of what exactly was accomplished.

A better practical sense of how deep learning can help with multi-modal, complex data can be seen by looking at a much smaller case: medical records.  As with intelligence data, medical records come in multiple forms, as well – notes, images, measurements. Even the numeric data can be problematic, if, say, the same units and standards are not used by all participants in measuring an attribute.  This blog describes how a Google team took over 200,000 patient records, each with numerous and varied points of data, and dumped them into a deep learning network. The take-away from the study was that the traditional machine-learning approach, along with its many hours of feature engineering and streamlining of the data, could be dispensed with in a deep learning approach.  The same lesson applies at a much larger scale to the multi-modal data with which the NSA deals.

In Summary

The entry of statistics into war can be conveniently dated to Winston Churchill’s creation of “S Branch” in 1940.  The focus initially was on accurate counts and tallies of things (and resolving conflicting numbers from different sources), but quickly developed into full-fledged statistical analysis.

As a profession, particularly in the early days in the UK and US, statistics flourished because it did not follow the thrust of another Churchill comment during the war:

“I only believe in statistics that I doctored myself.”