Data science and 9/11

data science
The importance of starting with a simple, naïve model in data science before considering a more complex machine learning approach, using the CIA’s photo analysis of “The Pacer” as an illustrative example.

Lucas A. Meyer


September 11, 2021

After the 9/11 attacks, the CIA was looking for Bin Laden all over the world.

One promising lead was a figure that lived in a compound in Pakistan that always shielded his face when walking outside.

The spies named this mysterious figure “The Pacer”, and got several aerial pictures of him.

In order to try to find out whether this was really Bin Laden, they tried to figure out the the person’s height from the pictures, since Bin Laden was quite tall at 195 cm (6’5”).

The National Geospatial Intelligence Agency examined the images, and their photo analysts proudly reported that the man was “somewhere between five feet (150 cm) and seven feet (210 cm) tall”.

A lot of #datascience is like that: technically correct, but not super helpful for decision-making. What we are really looking for is the “lift”: how does some piece of analysis bring us closer to a decision? Does your model provide better result than a naïve baseline that someone can quickly put together?

Many practitioners (including myself) think that before you start thinking hard, and WAY before you start thinking about #machinelearning, you should start with a naïve model. Sometimes that’s all you need. Sometimes that’s all you have data for. Your model should at least beat that.

(The Pacer was actually Bin Laden)