Two models of inference: Design vs. model based

Introduction

The majority of studies in the social, behavioral and medical sciences uses some form of survey data. Often surveys based on a small sample are used to say something about a larger population. This step is called inference. Many of the developments in statistics in the 20th century centered around the developments of valid inference procedures. For example, p-values and Confidence Intervals are designed to reflect the uncertainty that surrounds the use of a small sample for saying something about the larger population. This model of inference is called design-based inference. Crucial in design based inference is the process of drawing a random sample in a controlled way from the population. A second model of inference does not a neat random sample, but uses “found” data to do inference, and relies on statistical modeling to model the relation between the sample and population. In this week, we will use the example of the 2016 U.S. Presidential Election to illustrate why there is a renewed 21st century battle between the two paradigms in how to do inference.

Literature

Optional:

  • Blumenthal, M., Clement, S., Clinton, J. D., Durand, C., Franklin, C., Miringoff, L., & Witt, G. E. (2017).

Lecture

Two models for inference. The 21st-century war in survey science. Slides

Exercises

Analyse presidential polls for the U.S. election. What type of poll was better at predicting the U.S. election? Introduction of the key terms: error and bias

Class exercise

Take home exercise

Adopt a survey. Every student will adopt one survey (from a longlist). Every survey comes with documentation on the survey design, as well as survey dataset. on this in weeks 1-6), and then correct for unit- and item nonresponse for this survey. There is a longlist of suggestions of surveys to adopt in the exercise. You may choose your own survey, but be aware that data access is not always easily organized and may take some time.

Take home exercise

Next