Statistics 2 Quiz

1.

You have data on the incomes of retirees in your community. A sample of 50 looks like this in the graph below. Suppose you took another sample of 50 incomes, found their mean, and plotted it on a number line. Suppose you did this over and over until you had 1000 means from 1000 samples. Approximately what would the shape of the distribution of these 1000 means look like?

a. the same shape as the distribution of individual incomes

b. the shape of a normal distribution

c. the shape of the chi-squared distribution

d. insufficient information to predict the shape

2.

The context and purpose of a hypothesis test are best described how?

a. You are analyzing a study and need to show whether chance might account for the results.

b. You are analyzing a study and need to establish the validity of the null hypothesis

c. You are designing a study, and need a system for allocating subjects to different groups.

d. You are analyzing a survey and need to establish a margin of error

3.

You are working in an area where the cost of each measurement is very high so you have to make do with 10 observations. A dot plot looks like the image shown. You were planning to use the t-distribution to construct a confidence interval for the mean but it looks like you have a problem with

a. skewness

b. confounding

c. outliers

d. multicollinearity

4.

The graph below is a bootstrap distribution of the mean house value ($million) in a high-end neighborhood in San Francisco, based on a sample of 10. To find the endpoints of a 90% confidence interval, you would

a. Find the z-scores of the values in this frequency distribution.

b. Normalize the values by subtracting the mean and dividing by sqrt 10.

c. Find the points that chop 5% off each end of the distribution, those are the endpoints of a 90% interval.

d. Find the points that chop 10% off each end of the distribution, these are the endpoints of a 90% interval.

5.

After some research in the library, you find two studies on a subject that interests you – the number of playing days a typical NFL (football) player loses due to injury in his career. One had a sample size of 30 and the other a sample size of 150. For which study would you expect the standard error of the mean to be smaller?

a. The study with the larger sample size.

b. The study with the smaller sample size.

c. There is not enough information to have an opinion.

6.

You wanted to test the hypothesis that a population parameter is zero. You asked your assistant to do the analysis. They did everything correctly except they generated a confidence interval instead of a hypothesis test. The confidence interval ranges from -23 to -15. What is your conclusion?

a. do not reject the hypothesis

b. reject the hypothesis

c. insufficient evidence to do the hypothesis test

7.

You wish to do a study to measure a relatively small effect. Increasing the sample size will

a. increase the size of the effect

b. decrease the size of the effect

c. increase the chance of detecting the effect

d. decrease the chance of detecting the effect

8.

If the probability that mortgage A defaults is 0.1, and the probability that mortgage B defaults is 0.2, what is the probability that both will pay off on time? What assumption is required for this answer to be strictly accurate?

a. .02, conditionality

b. 0.72, independence

c. 0.3, independence

d. .02, independence

9.

You have sample data on two groups as follows: You want to estimate the population mean for each group. For which group will the estimate be more precise?

a. A

b. B

c. equally precise for A and B

d. insufficient information to tell

10.

If you increase the sample size in a survey, what will normally happen to the power of a statistical test on the data?

a. it will stay the same

b. it will decrease

c. it will increase

d. insufficient information to tell

11.

What is the difference between "standard deviation" and "standard error"?

a. The difference is in the denominator – for standard deviation you divide by n-1, for standard error you divide by n.

b. Both involve measuring the residuals in data, but standard deviation uses absolute values while standard error is measured in squared values.

c. No difference – they are different terms for the same thing.

d. Standard deviation measures the variability of individual observations, standard error measures the variability of a statistic.

12.

A website administrator is worried about response times for a web page to load, the target is an average of 2 seconds. A test is run at a variety of times, and the following results are obtained (secs): 0.8, 0.9, 1.9, 3.2, 4.0, 0.7, 1.2, 5.6, 3.9, 2.7. You have been asked to do a very quick analysis to summarize this data. You take 1000 bootstrap samples and plot a distribution of these resample means (see figure). What would your report most likely look like?

a. The average response time from the sample is 2 seconds, and this is within the range of random variation, so there is nothing of statistical significance to report, hence no need for action.

b. The average response time from the sample is 2.49 seconds, and is statistically significant from the target time of 2 seconds, so there is no need to collect more data and modifications are needed for the page.

c. The estimated average response time from the sample is 2.49 seconds, but the possible random sampling variation in this estimate (from about 1.5 to 3.5) suggests the need to collect more actual response time data.

d. The estimated average response time from the sample is 2.49 seconds, but the possible random sampling variation in this estimate (from about 1.5 to 3.5) suggests the need to substantially increase the number of bootstrap samples.

13.

A health services provider has grown and now wants to use patient data to modify its procedures to produce better patient outcomes. Typically, 15% of patients seen for respiratory problems come back to the doctor within 10 days. The provider wants to try a new procedure in which the doctor calls the patient back 2 days after the initial visit. Which of the following would be an appropriate part of the data analysis following this experiment?

a. A one-sample test of whether a proportion differs from a benchmark.

b. A confidence interval around a mean.

c. A two sample test of a difference in means.

d. A paired-sample test of the difference in proportions.

14.

A health insurance company conducts an experiment in conjunction with certain hospitals to determine whether a standard surgery protocol should be modified. For the analysis of the data, alpha is set in advance at 0.05 (5%). What does this mean?

a. If an improvement is found, the p-value must fall above 0.05 for the result to be considered statistically significant.

b. The improvement from the modified protocol must reach 5% in order to be considered statistically significant.

c. If an improvement is found, statistical significance requires less than a 5% chance of seeing such impressive results under the null model of “no improvement.”

d. The error level (residuals) must fall below 5% for the result to be considered statistically significant.

15.

A large web retailer regularly conducts tests by randomly showing one of 5 price levels when a person shops. A marketing manager is concerned about imbalances in the page views of each price level and conducts a study in which N page views are examined and the price level shown in each of the N page views is noted. Which of the following is an appropriate step in a resampling procedure to assess whether the allocation of pricing views is truly random?

a. Calculate expected distribution of page views across pricing levels under the null model by dividing N by 5.

b. Randomly generate N numbers in the range 1-5.

c. Count the frequency of randomly-generated 1’s, 2’s, 3’s, 4’s and 5’s and subtract from N/5.

d. All of the above

e. None of the above

16.

Consider the two scatterplots shown – one relating baseball win/loss records to the payroll, the other relating training hours to work productivity. Guess at the correlation coefficient for each.

a. Baseball 2.4, training 0.04

b. Baseball -0.5, training -1.0

c. Baseball .66, training .25

d. Baseball .66, training .95

17.

Consider the scatterplot, below, of baseball payroll and win/loss record. If a linear regression were performed and a regression line fit, what would be the slope and intercept? Make a guess.

a. intercept = 210, slope = 0.4

b. intercept = 0, slope = 0.25

c. intercept = 180, slope = 220

d. intercept = 180, slope = 0

18.

Consider the baseball payroll data with a regression line added (see figure). How would you interpret, in meaningful real terms, the extension of the regression line in either direction so that it spans the entire graph?

a. As x drops to 0, y drops to around 210.

b. A payroll of 250 would assure more than 300 wins.

c. You wouldn’t; the data lose meaning at the edges of the graph.

d. The extension of the line allows you to predict values outside the data range with the same confidence that you can predict values within the data range.

19.

Consider the following regression equation relating pulmonary capacity, measured in peak expiration flow rate (f), to years of exposure to cotton dust (d): f = -4.2d +424. Which of the following is true?

For a worker with 10 years of exposure and a PEFR value of 400, the residual is 18.

The predicted value of “f” for a worker with 20 years of exposure is 340.

The slope of this regression line is negative.

All of the above

None of the above

20.

Consider the following output from regression software, relating data on pulmonary capacity, measured in peak expiration flow rate (f), to years of exposure to cotton dust (d): f = -4.2d +424; p-value <0.01 Which of the following is true?

a. Flow rate increases as the exposure grows, and this finding is statistically significant.

b. There is a negative relationship between “f” and “d,” and it is statistically significant

c. Owing to the magnitude of the constant term (424) , residuals will be relatively small.

d. All of the above

e. None of the above

21.

About you: Are you looking for a statistics course that offers academic credit? (Tracking question only – no right answer)

a. Yes

b. No

First Name

Last Name

Email(Required)