CRO’s, or contract research organizations, are a $40 billion industry, growing at close to 12% per year. They provide contract services to the pharmaceutical industry, including statistical design and analysis, laboratory services, administration of clinical trials, and monitoring of drugs once they are on the market. Developing a new drug and bringing it to marketContinue reading “Industry Spotlight: CROs”
Yearly Archives: 2019
Handling the Noise – Boost It or Ignore It?
In most statistical modeling or machine learning prediction tasks, there will be cases that can be easily predicted based on their predictor values (signal), as well as cases where predictions are unclear (noise). Two statistical learning methods, boosting and ProfWeight, use those difficult cases in exactly opposite ways – boosting up-weights them, and ProfWeight down-weightsContinue reading “Handling the Noise – Boost It or Ignore It?”
Problem of the Week: Probability
Your country is at war, and an enemy plane has crashed on your territory. It bears the number 60, and a spy has told you that the aircraft are numbered serially. Can you make a guess about the total number of aircraft the enemy has produced? Solution: This problem is one of those published byContinue reading “Problem of the Week: Probability”
Rectangular data
Rectangular data are the staple of statistical and machine learning models. Rectangular data are multivariate cross-sectional data (i.e. not time-series or repeated measure) in which each column is a variable (feature), and each row is a case or record.
“Defiant” Supervision
How did the phrase “defiantly recommend”, as in “I defiantly recommend this product,” come into common usage on the internet? The answer is a good look inside the workings of supervised learning. Supervision, generally from humans, is instrumental in much of statistical and machine learning. Google’s precise search algorithms are not public, but the generalContinue reading ““Defiant” Supervision”
Industry Spotlight: Consulting
When a new technology arrives, consulting companies can quickly add staff and expertise to build institutional capacity centered around the technology in ways companies focused on delivering their own products and services cannot. Large consulting companies like Booz Allen and McKinsey, as well as smaller analytics-centric firms like Elder Research, thus constitute a significant jobContinue reading “Industry Spotlight: Consulting”
Good to Great
In 1994, Jim Collins and Jerry Porras, former and current Stanford professors, published the best-seller Built to Last that described how “long-term sustained performance can be engineered into the DNA of an enterprise.” It sold over a million copies. Buoyed by that success, Collins and a research team set out to find the characteristics of companiesContinue reading “Good to Great”
Selection Bias
Selection bias is a sampling or data collection process that yields a biased, or unrepresentative, sample. It can occur in numerous situations, here are just a few:
Space Shuttle Explosion
In 1986, the U.S. space shuttle Challenger exploded several minutes after launch. A later investigation found that the cause of the disaster was O-ring failure, due to cold temperatures. The temperature at launch was 39 degrees, colder than any prior launch. The cold caused the O-rings to become stiff and brittle, losing the flexibility thatContinue reading “Space Shuttle Explosion”
Alaskan Generosity
People in Alaska are extraordinarily generous – that’s what a predictive model showed, when applied to a charitable organization’s donor list. A closer examination revealed a flaw – while the original data was for all 50 states, the model’s training data for Alaska included donors, but excluded non-donors. The reason? The data was 99% non-donors,Continue reading “Alaskan Generosity”
Industry Spotlight – The Military
Abraham Wald, a persecuted Jewish mathematician who fled Austria just before World War II, led an analysis of allied bombers returning from missions. Hitherto, the Air Force had focused on reinforcing areas that showed the most damage on return. Wald convinced them instead to focus on the areas that consistently showed no damage. He reasonedContinue reading “Industry Spotlight – The Military”
Why Analytics Projects Fail – 5 Reasons
With the news full of so many successes in the fields of analytics, machine learning and artificial intelligence, it is easy to lose sight of the high failure rate of analytics projects. McKinsey just came out with a report that only 8% of big companies (revenue > $ 1 billion) have successfully scaled and integratedContinue reading “Why Analytics Projects Fail – 5 Reasons”
Political Analytics and Microtargeting
The statistics of targeting individual voters with specific messages, as opposed to messaging that went to whole groups, began in the U.S over a decade ago with the Democrats. Political targeting is now an established business, or at least a discipline within the broader realm of political consulting. By 2016, the Republicans had surged wellContinue reading “Political Analytics and Microtargeting”
The Statistics of Persuasion
The Art of Persuasion is the title of more than one book in the self-help genre, books that have spawned blogs, podcasts, speaking gigs and more. But the science of persuasion is actually of more interest, because it produces useful rules that can be studied and deployed. Marketers and politicians have long been enthusiastic usersContinue reading “The Statistics of Persuasion”
Historical Spotlight – ISOQOL
25 years ago the International Society of Quality of Life Research was founded with a mission to advance the science of quality of life and related patient-centered outcomes in health research, care and policy. While focusing on quality of life (QOL) in healthcare may seem like a no-brainer, measuring it is not as easy asContinue reading “Historical Spotlight – ISOQOL”
Book Review: Thinking Fast and Slow
Daniel Kahneman won a Nobel Prize in Economics for his work in behavioral economics, much of it with his colleague Amos Tversky, who died in 2006. Kahneman’s 2011 classic, Thinking Fast and Slow, is a superbly-written non-technical summary of their fascinating research and its often counter-intuitive findings. The best feature of the book is theContinue reading “Book Review: Thinking Fast and Slow”
Likert Scale
A “likert scale” is used in self-report rating surveys to allow users to express an opinion or assessment of something on a gradient scale. For example, a response could range from “agree strongly” through “agree somewhat” and “disagree somewhat” on to “disagree strongly.” Two key decisions the survey designer faces are
-
How many gradients to allow, and
-
Whether to include a neutral midpoint
Football Analytics
Preparing for the Superbowl Your team is at midfield, you have the ball, it’s 4th down with 2 yards to go. Should you go for it? (Apologies in advance to our many readers, especially those outside the U.S., who are not aficionados of American football, but it’s Superbowl week in the U.S. A quick guideContinue reading “Football Analytics”
Job Spotlight: Digital Marketer
A digital marketer handles a variety of tasks in online marketing – managing online advertising and search engine optimization (SEO), implementing tracking systems (e.g. to identify how a person came to a retailer), web development, preparing creatives, implementing tests, and, of course, analytics. There are typically three types of employers: Marketing agencies that contract outContinue reading “Job Spotlight: Digital Marketer”
Dummy Variable
A dummy variable is a binary (0/1) variable created to indicate whether a case belongs to a particular category. Typically a dummy variable will be derived from a multi-category variable. For example, an insurance policy might be residential, commercial or automotive, and there would be three dummy variables created: