This week’s book review is of Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are, Seth Stephens-Davidowitz’s fascinating book about how social media data reveals all sorts of things about us that we barely know ourselves.
For example, did you know that the ages 8-12 are the key years for boys, in forming an attachment to a baseball team? Reviewing Facebook likes of baseball teams, Stephens-Davidowitz found that the likes peaked among males who were 8-12 years old when that team won the World Series. He found a less clear-cut pattern among women, where the corresponding age was 22.
A similar pattern exists in politics. A significant factor in determining an Americans political orientation is the popularity of the person who was president in their teens. Those who came of age during the term of Eisenhower, a popular Republican president, tended to tilt Republican/conservative. Those who came of age under Kennedy, a popular Democrat, tended to tilt Democratic/liberal.
Stephens-Davidowitz used Google searches to explore the issue of bias. For example, he looked at bias against females, which, startlingly, is shared by girls parents. In the Google search phase my _____ gifted, the word “son” is more than twice as likely to occur as the word “daughter”. And in the phrase “is my ______ overweight”, the word “daughter” appears twice as likely as the word “son”.
Some other striking facts are revealed by Google search data:
Searches for “n-word” jokes, a proxy for racism, are just as common in Ohio, Pennsylvania and upstate New York as they are in the deep south.
These same racist searches do not rise during periods of economic distress
In 2011, searches for phrases related to self-induced abortion rose 40% – this year also saw big increase in statutory abortion restrictions (92 new state provisions).
Searches for gay pornography are roughly the same in Rhode Island and Mississippi (about 5% of all porn searches), while Gallup surveys put the percentage of gay men in Rhode Island at double that in Mississippi. The likely conclusion is that differing attitudes to gay people (Rhode Island is the most accepting of gay marriage, Mississippi the least) keep large numbers of them in the closet in Mississippi.
A commonly cited virtue of well-designed sample surveys is their ability to return results free of bias. The last point above makes it clear that, where self-assessment is involved, big data, collected according to no sample design at all, can trump random sampling and get closer to the truth. Stephens-Davidowitz terms Google searches the most important dataset ever collected on the human psyche.