Blog

rss

Posted on Apr 12, 2019 By: Peter Bruce
In the U.S., credit scoring is dominated by three companies - Experian, TransUnion and Equifax, employing roughly 30,000 people.  An important player in the scoring methodology is FICO, previously Fair Isaac Corporation, and the scores are typically called “FICO scores.”  Credit scoring is the oldest application of predictive modeling, fulfilling a need that has been around for millenia. Ever since the development of money and, hence, money-lending, lenders have needed to assess the credit...
Posted on Apr 12, 2019 By: Peter Bruce
The IRS (U.S. Internal Revenue Service) has been using computers to choose tax returns for audit since 1962. Early on, the selection was rule-based, but the IRS turned to statistical modeling in 1969, using the oldest predictive analytics model in the toolbox - discriminant analysis.  Discriminant analysis, a linear classification technique, was first proposed by Ronald Fisher in 1936. Computer scientists think of discriminant analysis as quaint and old-fashioned, that is, if they think of it a...
Posted on Apr 08, 2019 By: Peter Bruce
Cathy O’Neil’s Weapons of Math Destruction, when it was first published in 2016, sounded an early alarm about the big data algorithms and their potential for social evil.  The cover is adorned with a robotic death’s head and the subtitle reads “How Big Data Increases Inequality and Threatens Democracy.”    O’Neil’s book begins with stories that are about data, but don’t really tap into the dangers posed by “big data algorithms.”  She relates the origins of baseball ana...
Posted on Apr 08, 2019 By: Peter Bruce
80 years ago, in 1939, Alan Turing began work on the code-breaking system that would eventually prove key in helping Britain survive the German submarine threat in the Atlantic.     Last month, the Turing Award in computer science prize (sometimes referred to as the "Nobel Prize of Computing") was given to three researchers, Yann LeCunn, Geoffrey Hinton and Yoshua Bengio for their work on deep learning.   Turing, a pioneer in computer science, proposed the terms of what became known as th...
Posted on Apr 08, 2019 By: Peter Bruce
To a statistician, a sample is a collection of observations (cases).  To a machine learner, it’s a single observation.  Modern data science has its origin in several different fields, which leads to potentially confusing homonyms and synonyms, like these: Homonyms (words with multiple meanings): Bias:  To a lay person, bias refers to an opinion about something that is pre-formed in advance of specific facts.  As consideration of ethical issues in data science grows, this meaning has ...
Posted on Mar 28, 2019 By: Peter Bruce
Nothing better illustrates the encroachment of data science and analytics on the older “economy of tangible things” than the business of delivering packages.  The use of analytics in package delivery is not new. Companies like UPS and Fedex are longtime users of operations research methods like optimization and simulation to route inter-city shipments, site new depots, allocate shipments among different modes, and simulate capacity utilization.  UPS’s Jack Levis championed ORION, “On-...
Posted on Mar 28, 2019 By: Peter Bruce
Prior to the advent of internet-connected devices, the largest source of big data was public interaction on the internet.  Social media users, as well as shoppers and searchers on the internet, make an implicit deal with the big companies that provide these services:  users can take advantage of powerful search, shopping and social interaction tools for free, and, in return, the companies get access to user data.   More and more news stories appear concerning illegal, fraudulent, unsavory ...
Posted on Mar 22, 2019 By: Peter Bruce
The field of sports statistician is not exactly new; the American Statistical Association’s section on Sports Statistics was formed in 1992.  Three of Statistics.com’s instructors have professional experience in sports statistics - Ben Baumer (SQL) served as statistician for the NY Mets, Stephanie Kovalchik (Meta Analysis in R) with Tennis Australia, and Joe Hilbe, who died in 2017, was a national champion track & field athlete and chaired the American Statistical Association’s Secti...
Posted on Mar 22, 2019 By: Peter Bruce
  The U.S. baseball season opens Thursday, March 28, and celebrates the 48th season of analytics in baseball, beginning with the founding of the Sabermetric Society in 1971 (the same year that Satchel Paige entered the Hall of Fame).  Analytics has come a long way in sports, and now has its own conference, the MIT Sports Analytics Conference. This is an outlier in the world of statistics conferences, with its sky-high registration fees, its sponsorship by ESPN, and speakers like Malcolm Gla...
Posted on Mar 18, 2019 By: Peter Bruce
Charles Darwin, the most famous grandson of the Enlightenment thinker Erasmus Darwin, published his ground-breaking theory of evolution, “The Origin of Species,”160 years ago.     Another grandson of Erasmus, Francis Galton, became one of the founding fathers of statistics (correlation, the “wisdom of the crowd,” regression and regression to the mean are all  Galton’s ideas). Heavily influenced by Darwin’s theories of natural selection,  Galton and his colleagues developed ear...
← Older post