In this week’s Brief, we look at p-values. Plus, we’ve scheduled a couple of extra course sessions for April: Use the month of April to introduce yourself to Python, or, for those with some Python familiarity, learn how to apply it to predictive analytics. April 10 – May 8: Introduction to Python Programming (for newcomersContinue reading “Mar 31: Statistics in Practice”
Monthly Archives: March 2020
P-Values – Are They Needed?
The Depression Gene
Hazard
Mar 24: Statistics in Practice
In this week’s Brief, we look again at the statistics of Coronavirus. We also spotlight our Health Analytics Mastery – a 3-course series in which you can choose from among Biostatistics 1 and 2 Designing Valid Statistical Studies Epidemiologic Statistics * Introduction to Statistical Issues in Clinical Trials You can start July 1 with BiostatisticsContinue reading “Mar 24: Statistics in Practice”
Covid-19 Parameters
There are many moving parts in modeling the spread of an epidemic, a subject that has lately attracted the attention of great numbers of statistically-oriented non-epidemiologists (like me). I’ve put together a “lay statistician’s guide” to some of the important parameters and factors (and I welcome corrections/additions!). Terms Case fatality rate or CFR: Deaths asContinue reading “Covid-19 Parameters”
Preliminary Paper
Here is a preliminary paper that suggests that RNA extraction kits, one of the main bottlenecks to Covid-19 testing in the US, can be skipped altogether and the next part of the assay (RT-qPCR) still works. If confirmed, this result would have a major impact on how many tests state and hospital labs could runContinue reading “Preliminary Paper”
Mar 18: Statistics in Practice
In this week’s Brief, we look at the coronavirus, and the problem of estimating prevalence and mortality. Our course spotlight is Nov 8 – Dec 6: Epidemiologic Statistics (we’re adding a spring session – email us to be notified when registration opens at ourcourses@statistics.com) See you in class! – Peter Bruce Founder, Author, and SeniorContinue reading “Mar 18: Statistics in Practice”
Standardized Death Rate
Often the death rate for a disease is fully known only for a group where the disease has been well studied. For example, the 3711 passengers on the Diamond Princess cruise ship are, to date, the most fully studied coronavirus population. All passengers were tested and tracked by health authorities, and the death rate wasContinue reading “Standardized Death Rate”
Coronavirus – in Search of the Elusive Denominator
Anyone with internet access these days has their eyes on two constellations of data – the spread of the coronavirus, and the resulting collapse of the financial markets. Following the 13% one-day drop of the stock market a week ago, The Wall Street Journal forecast a quarterly GDP drop of as much as 10% –Continue reading “Coronavirus – in Search of the Elusive Denominator”
Coronavirus: To Test or Not to Test
In recent years, under the influence of statisticians, the medical profession has dialed back on screening tests. With relatively rare conditions, widespread testing yields many false positives and doctor visits, whose collective cost can outweigh benefits. Coronavirus advice follows this line – testing is limited to the truly ill (this is also due to aContinue reading “Coronavirus: To Test or Not to Test”
Mar 16: Statistics in Practice
In this week’s Brief, we look at combining models. Our course spotlight is April 17 – May 1: Maximum Likelihood Estimation (MLE) You’ve probably seen lots of references to MLE in other contexts – this quick 2-week course (only $299) is your chance to study it on its own. See you in class! – PeterContinue reading “Mar 16: Statistics in Practice”
Regularized Model
In building statistical and machine learning models, regularization is the addition of penalty terms to predictor coefficients to discourage complex models that would otherwise overfit the data. An example is ridge regression.
Ensemble Learning
In his book, The Wisdom of Crowds, James Surowiecki recounts how Francis Galton, a prominent statistician from the 19th century, attended an event at a country fair in England where the object was to guess the weight of an ox. Individual contestants were relatively well informed on the subject (the audience was farmers), but theirContinue reading “Ensemble Learning”
Mar 9: Statistics in Practice
In this week’s Brief, we look at ways to determine optimal sample size. Our course spotlight is April 10 – May 8: Sample Size and Power Determination See you in class! – Peter Bruce Founder, Author, and Senior Scientist Big Sample, Unreliable Result The 1948 Kinsey report on male sexual behavior in the U.S. yieldedContinue reading “Mar 9: Statistics in Practice”
Ridge Regression
Ridge regression is a method of penalizing coefficients in a regression model to force a more parsimonious model (one with fewer predictors) than would be produced by an ordinary least squares model. The term “ridge” was applied by Arthur Hoerl in 1970, who saw similarities to the ridges of quadratic response functions. In ordinary leastContinue reading “Ridge Regression”
Big Sample, Unreliable Result
Which would you rather have? A large sample that is biased, or a representative sample that is small? The American Statistical Association committee that reviewed the 1948 Kinsey report on male sexual behavior, based on interviews with over 5000 men, left no doubt of their preference for the latter. The statisticians – William Cochran, FrederickContinue reading “Big Sample, Unreliable Result”
Mar 2: Statistics in Practice
In this week’s Brief, we look at hierarchical and mixed models. Our course spotlight is April 10 – May 8: Generalized Linear Models April 24 – May 22: Mixed and Hierarchical Linear Models See you in class! – Peter Bruce Founder, Author, and Senior Scientist Mixed Model – When to Use? In 1861, the BritishContinue reading “Mar 2: Statistics in Practice”
Problem of the Week: Notify or Don’t Notify?
Our problem of the week is an ethical dilemma, posed by the New England Journal of Medicine to its readers 10 days ago. Volunteers contributed DNA samples to investigators building a genetic database for study, on condition the data would be deidentified and kept confidential and that they themselves would not learn results. Should participantsContinue reading “Problem of the Week: Notify or Don’t Notify?”
Factor
The term “factor” has different meanings in statistics that can be confusing because they conflict. In statistical programming languages like R, factor acts as an adjective, used synonymously with categorical – a factor variable is the same thing as a categorical variable. These factor variables have levels, which are the same thing as categories (aContinue reading “Factor”