It was 150 years ago when Darwin first used the term “evolution” in his writing (in his book The Descent of Man). Two months ago, in The Normal Share of Paupers, I briefly discussed the unfortunate eugenics baggage that the discipline of statistics inherited from Charles Darwin, baggage that left a taint (at least to modern eyes) on the founding fathers of statistics. A later, less toxic, statistical inheritance from Darwin is the evolutionary algorithm, used to solve optimization problems where there are too many potential solutions to examine them all exhaustively.
Traveling Salesman
Consider the traveling salesman problem (TSP), a classic in operations research. A salesman must visit n clients, all in different locations. How should the salesman travel to minimize travel time? To guarantee an optimal solution, all the routes must be examined. With a small number of routes, the problem is tractable.
For example, with 4 clients, starting with the client in location 1, there are (n-1)!, or six ways to complete the journey:
1 > 2 > 3 > 4
1 > 2 > 4 > 3
1 > 3 > 2 > 4
1 > 3 > 4 > 2
1 > 4 > 2 > 3
1 > 4 > 3 > 2
With more destinations, though, the task of examining all solutions balloons out of control. With 17 locations, there are nearly 21 trillion routes. An evolutionary algorithm takes the following approach:
- Start with an arbitrary (often random) set of possible solutions, and score each for their “fitness” (i.e. the time to complete the route).
- Modify a few of the solutions by “crossover” – exchange route elements between solutions (for example, if one route solution goes … 7 > 3 > 4 … and another goes … 3 > 4 > 7 then swap those segments).
- Modify a few other solutions by randomly changing elements (“mutation”).
- Assess this new “population” of solutions, drop the poor performers and replace them with more crossovers and mutations (“survival of the fittest”).
- Keep repeating.
This process, considering only a small subset of solutions at a time, gradually discovers better and better solutions, just as the genetic processes of crossover and mutation improve the fitness and survivability of species. The random component helps avoid a common problem in non-exhaustive optimization – getting stuck in local optima.
Other Applications
The Wolverine Manufacturing Company operates machines that can be programmed to automatically perform precise drilling and machining operations. The current job requirement is for nine holes to be drilled in precise locations on a flat fiberglass panel that is used in the production of a popular automobile. After each hole is drilled, the machine retracts the drill bit and moves it (somewhat slowly) to the next location until all nine holes have been completed. Because this process must be repeated for millions of panels, Wolverine wants to make sure it programs the machine to move from hole to hole in the most efficient manner – minimizing the total distance the drill must travel.
In this problem the drill bit is the salesman, and the path it follows from hole to hole is the salesman’s route.
Instacart
Instacart needs to solve the Traveling Salesman Problem to efficiently route its shoppers. Actually, it needs to solve a more general (harder) version of the TSP known as the Vehicle Routing Problem (VRP) because it must make decisions about how to route not just one shopper and one batch of orders, but numerous shoppers with numerous order batches. Plus, it has to meet a time constraint, to get the orders delivered in time. Jeremy Stanley, former VP of Data Science at Instacart, has written a highly readable summary about these and further refinements of the TSP that Instacart must deal with to route its shoppers.