Skip to content

Analytics Quiz

Test Yourself

Analytics – Quiz

A 10-question quiz drawing from various analytics areas.


In fitting a model to predict whether a person viewing an ecommerce web site will click on a particular link, a certain company drew the training data from web logs of the browsing records of prior visitors. Various variables were found to be useful in predicting the target, including a binary variable indicating whether or not the person made a purchase. How should that variable be handled:


A dataset has 2000 records and 50 variables with 6% of the values missing, spread randomly throughout the records and variables. An analyst decides to remove records that have missing values. What is the approximate probability that a given record will be removed?


Two predictive models have been fit to data with a binary target variable, using the 100+ predictor variables that are available. One is a logistic regression model, the other is a neural net using the maximum number of variables, layers, nodes and cycles permitted by the software. Which of the following is/are true: (i) The logistic regression model is less likely to overfit the training data; (ii) The difference in accuracy between training and validation data is likely to be greater for the neural net than the logistic regression; (iii) A simpler neural net may perform worse on the training data, but better on the validation data


A business wishes to segment its customers into a small number of groups so that it can effectively target marketing efforts at different customer types. Which of the following correctly describes the process and the order of the steps:


Souvenir sales at a beach resort in Queensland, Australia are shown in this figure as raw data and as transformed data. Choose the statement, below, that is most accurate:


The attached figure is a set of association rules derived from transactional data on cosmetic purchases. Which of the following statements is most accurate:


Consider two different text mining tasks: (i) Mining the "contact us" submissions from a web site, to predict purchase/no-purchase, (ii) Mining internal email correspondence in a natural resource company to determine relevance to an environmental enforcement action. Think carefully about the process of preparing the text for predictive modeling, and the scenario involved. Which of the following is most true:


In considering the use of logistic regression, neural networks, and classification & regression trees for prediction and classification, (choose the best answer)


A direct response advertising firm, in a test of a popup web offer presented to all visitors, gets a response rate of 1.5% with no predictive model applied. It develops a logistic regression model to estimate the probability that visitors will respond. In validating the model on a holdout sample, it gets a lift of 2 on the top decile. Which of the following is true?


A political consultant wants to predict how individual voters will vote, and has data on whether the voter has voted in the past 10 years worth of primary and general elections, data on 100+ demographic attributes of the neighborhood in which the voter lives, as well as purchased data on 200+ consumer spending variables. Which of the following would NOT be useful in dealing with the issues of dimension reduction and feature selection: