Mar 31: Statistics in Practice

In this week’s Brief, we look at p-values.  Plus, we’ve scheduled a couple of extra course sessions for April:  Use the month of April to introduce yourself to Python, or, for those with some Python familiarity, learn how to apply it to predictive analytics. April…

Comments Off on Mar 31: Statistics in Practice

P-Values – Are They Needed?

Five years ago last month, the psychology journal Basic and Applied Social Psychology instigated a major debate in statistical circles when it said it would remove p-value citations from papers it published.  A year later, the American Statistical Association (ASA) released a statement on p-values…

Comments Off on P-Values – Are They Needed?

The Depression Gene

The risks of large-scale testing, and the potential for false discovery, can be seen in the “discovery” of the genetic basis for anxiety and depression.  Specifically, serotonin transporter gene 5-HTTLPR. Color Genomics sells a genetic testing product that supposedly can predict which anti-depressant drug works…

Comments Off on The Depression Gene

Hazard

In biostatistics, hazard, or the hazard rate, is the instantaneous rate of an event (death, failure…).  It is the probability of the event occurring in a (vanishingly) small period of time, divided by the amount of time (mathematically it is the limit of this quantity…

Comments Off on Hazard

Mar 24: Statistics in Practice

In this week’s Brief, we look again at the statistics of Coronavirus.  We also spotlight our Health Analytics Mastery - a 3-course series in which you can choose from among Biostatistics 1 and 2 Designing Valid Statistical Studies Epidemiologic Statistics * Introduction to Statistical Issues…

Comments Off on Mar 24: Statistics in Practice

Covid-19 Parameters

There are many moving parts in modeling the spread of an epidemic, a subject that has lately attracted the attention of great numbers of statistically-oriented non-epidemiologists (like me).  I’ve put together a “lay statistician’s guide” to some of the important parameters and factors (and I…

Comments Off on Covid-19 Parameters

Preliminary Paper

Here is a preliminary paper that suggests that RNA extraction kits, one of the main bottlenecks to Covid-19 testing in the US, can be skipped altogether and the next part of the assay (RT-qPCR) still works.  If confirmed, this result would have a major impact…

Comments Off on Preliminary Paper

Mar 18: Statistics in Practice

In this week’s Brief, we look at the coronavirus, and the problem of estimating prevalence and mortality.  Our course spotlight is Nov 8 - Dec 6:  Epidemiologic Statistics (we're adding a spring session - email us to be notified when registration opens at ourcourses@statistics.com) See…

Comments Off on Mar 18: Statistics in Practice

Standardized Death Rate

Often the death rate for a disease is fully known only for a group where the disease has been well studied.  For example, the 3711 passengers on the Diamond Princess cruise ship are, to date, the most fully studied coronavirus population.  All passengers were tested…

0 Comments

Coronavirus – in Search of the Elusive Denominator

Anyone with internet access these days has their eyes on two constellations of data - the spread of the coronavirus, and the resulting collapse of the financial markets.  Following the 13% one-day drop of the stock market a week ago, The Wall Street Journal forecast…

Comments Off on Coronavirus – in Search of the Elusive Denominator

Coronavirus: To Test or Not to Test

In recent years, under the influence of statisticians, the medical profession has dialed back on screening tests.  With relatively rare conditions, widespread testing yields many false positives and doctor visits, whose collective cost can outweigh benefits.  Coronavirus advice follows this line - testing is limited…

Comments Off on Coronavirus: To Test or Not to Test

Mar 16: Statistics in Practice

In this week’s Brief, we look at combining models.  Our course spotlight is April 17 - May 1:  Maximum Likelihood Estimation (MLE) You’ve probably seen lots of references to MLE in other contexts - this quick 2-week course (only $299) is your chance to study…

Comments Off on Mar 16: Statistics in Practice

Regularized Model

In building statistical and machine learning models, regularization is the addition of penalty terms to predictor coefficients to discourage complex models that would otherwise overfit the data.  An example is ridge regression.

Comments Off on Regularized Model

Ensemble Learning

In his book, The Wisdom of Crowds, James Surowiecki recounts how Francis Galton, a prominent statistician from the 19th century, attended an event at a country fair in England where the object was to guess the weight of an ox.   Individual contestants were relatively well…

Comments Off on Ensemble Learning

Mar 9: Statistics in Practice

In this week’s Brief, we look at ways to determine optimal sample size.  Our course spotlight is April 10 - May 8:  Sample Size and Power Determination See you in class! - Peter Bruce Founder, Author, and Senior Scientist Big Sample, Unreliable Result The 1948…

Comments Off on Mar 9: Statistics in Practice

Ridge Regression

Ridge regression is a method of penalizing coefficients in a regression model to force a more parsimonious model (one with fewer predictors) than would be produced by an ordinary least squares model. The term “ridge” was applied by Arthur Hoerl in 1970, who saw similarities…

Comments Off on Ridge Regression

Big Sample, Unreliable Result

Which would you rather have?  A large sample that is biased, or a representative sample that is small?  The American Statistical Association committee that reviewed the 1948 Kinsey report on male sexual behavior, based on interviews with over 5000 men, left no doubt of their…

Comments Off on Big Sample, Unreliable Result

Mar 2: Statistics in Practice

In this week’s Brief, we look at hierarchical and mixed models.  Our course spotlight is April 10 - May 8:  Generalized Linear Models April 24 - May 22:  Mixed and Hierarchical Linear Models See you in class! - Peter Bruce Founder, Author, and Senior Scientist…

Comments Off on Mar 2: Statistics in Practice

Problem of the Week: Notify or Don’t Notify?

Our problem of the week is an ethical dilemma, posed by the New England Journal of Medicine to its readers 10 days ago.  Volunteers contributed DNA samples to investigators building a genetic database for study, on condition the data would be deidentified and kept confidential…

Comments Off on Problem of the Week: Notify or Don’t Notify?

Factor

The term “factor” has different meanings in statistics that can be confusing because they conflict.   In statistical programming languages like R, factor acts as an adjective, used synonymously with categorical - a factor variable is the same thing as a categorical variable.  These factor variables…

Comments Off on Factor

Mixed Models – When to Use

Companies now have a lot of data on their customers at an individual level.  Suppose you are tasked with forecasting customer spending at a grocery chain, and you want to understand how customer attributes, local economic factors, and store issues affect customer spending. You could…

Comments Off on Mixed Models – When to Use