Evolutionary Algorithms - Statistics.com: Data Science, Analytics & Statistics Courses

It was 150 years ago when Darwin first used the term “evolution” in his writing (in his book The Descent of Man). Two months ago, in The Normal Share of Paupers, I briefly discussed the unfortunate eugenics baggage that the discipline of statistics inherited from Charles Darwin, baggage that left a taint (at least to modern eyes) on the founding fathers of statistics. A later, less toxic, statistical inheritance from Darwin is the evolutionary algorithm, used to solve optimization problems where there are too many potential solutions to examine them all exhaustively.

Traveling Salesman

Consider the traveling salesman problem (TSP), a classic in operations research. A salesman must visit n clients, all in different locations. How should the salesman travel to minimize travel time? To guarantee an optimal solution, all the routes must be examined. With a small number of routes, the problem is tractable.

For example, with 4 clients, starting with the client in location 1, there are (n-1)!, or six ways to complete the journey:

1 > 2 > 3 > 4

1 > 2 > 4 > 3

1 > 3 > 2 > 4

1 > 3 > 4 > 2

1 > 4 > 2 > 3

1 > 4 > 3 > 2

With more destinations, though, the task of examining all solutions balloons out of control. With 17 locations, there are nearly 21 trillion routes. An evolutionary algorithm takes the following approach:

Start with an arbitrary (often random) set of possible solutions, and score each for their “fitness” (i.e. the time to complete the route).
Modify a few of the solutions by “crossover” – exchange route elements between solutions (for example, if one route solution goes … 7 > 3 > 4 … and another goes … 3 > 4 > 7 then swap those segments).
Modify a few other solutions by randomly changing elements (“mutation”).
Assess this new “population” of solutions, drop the poor performers and replace them with more crossovers and mutations (“survival of the fittest”).
Keep repeating.

This process, considering only a small subset of solutions at a time, gradually discovers better and better solutions, just as the genetic processes of crossover and mutation improve the fitness and survivability of species. The random component helps avoid a common problem in non-exhaustive optimization – getting stuck in local optima.

Other Applications

The traveling salesman is an evocative, if somewhat anachronistic image, that stands in for a whole range of problems. Cliff Ragsdale, instructor for our three optimization courses (see the Mastery Series in Optimization), and author of the best selling text Spreadsheet Modeling and Decision Analysis, describes one such problem:

The Wolverine Manufacturing Company operates machines that can be programmed to automatically perform precise drilling and machining operations. The current job requirement is for nine holes to be drilled in precise locations on a flat fiberglass panel that is used in the production of a popular automobile. After each hole is drilled, the machine retracts the drill bit and moves it (somewhat slowly) to the next location until all nine holes have been completed. Because this process must be repeated for millions of panels, Wolverine wants to make sure it programs the machine to move from hole to hole in the most efficient manner – minimizing the total distance the drill must travel.

In this problem the drill bit is the salesman, and the path it follows from hole to hole is the salesman’s route.

Instacart

Instacart is a grocery delivery service using the gig model – individuals “shoppers” use the Instacart app to secure assignments to shop at a grocery store on behalf of someone, and then deliver the items to that person. The service has soared in popularity as a result of the Covid-19 lockdown.

Instacart needs to solve the Traveling Salesman Problem to efficiently route its shoppers. Actually, it needs to solve a more general (harder) version of the TSP known as the Vehicle Routing Problem (VRP) because it must make decisions about how to route not just one shopper and one batch of orders, but numerous shoppers with numerous order batches. Plus, it has to meet a time constraint, to get the orders delivered in time. Jeremy Stanley, former VP of Data Science at Instacart, has written a highly readable summary about these and further refinements of the TSP that Instacart must deal with to route its shoppers.

The Traveling Salesman Problem was given its name by the American Mathematician Julia Robinson in 1948, but its antecedents go back to the 1800’s. It served as the catalyst for a number of the puzzle-style challenges that are so popular in the mathematical and statistical communities, and contributed to the development of a number of different algorithms to approximate the optimal solution, of which the evolutionary approach outlined above is just one.