Saturday, September 29, 2012

Is panel attrition the same as nonresponse?

All of my research is focused on the methods of assembling and analysis of panel survey data. One of the primary problems of panel survey projects is attrition or drop-out. Over the course of a panel survey, many respondents decide to no longer participate.

Last july I visited the panel survey methods workshop in Melbourne, at which we had extensive discussions about panel attrition. How to study it, what the consequences are (bias) for survey estimates, and how to prevent it from happening altogether.

These questions have a lot in common with the questions that are being discussed at another workshop for survey methodologists: the nonresponse workshop. The only difference is that at the nonresponse workshop we discuss one-off, cross-sectional surveys, and at the panel survey workshop, we dicuss what happens after the first wave of data collection.

I am in the middle of writing a book chapter (with Annette Scherpenzeel and Marcel Das of Centerdata) on attrition in the LISS Internet panel, and one of the questions that we try to answers is whether nonrespondents in the first wave are actually similar to respondents who drop out at wave 2 or later. Or to be more precise, whether nonrespondents are actually similar to fast attriters, or to other sub-groups of attriters.

The graph below shows attrition patterns for the people in the LISS panel for the 50 waves that we analysed. The green line on top represents people who have response propensities close to 1, meaning they always participate. The brown line represents fast attriters, and the pink, dark blue, and purple lines slowers groups that drop out more slowly. You also find new panel entrants (dark grey and red line), and finally, a almost invisible black line that has response propensities of 0, meaning that although these people consented to become a panel member, they never actually participate in the panel.

click on the Figure to enlarge

For the whole story you'll have to wait for book on 'Internet panel surveys' to come out somewhere in 2013, but I'll focus here on comparing initial nonrespondents to respondents who do consent, but then never participate.
These groups turn out to be different. Not just a little different, but hugely different. This was somewhat surprising to me, as many survey methodologists believe that early panel attrition is some kind of continuation of initial nonresponse. It turns out not to be. Fast attriters are very different from initial nonrespondents. My hypothesis for this is that some specific groups of people may ' accidentily'  say yes to a first survey request, but then try to get out of the survey as fast as they can. I am still not sure what this implies for panel research (comments very welcome): does it mean that the methods that we use to target nonrespondents (persuasion principles of Cialdini et al. 1991) might not work in panel surveys, and that we need to use different methods?

I think the first few waves of a panel study are extremely important for keeping attrition low in the long run. So, I think we should perhaps prolong some of the efforts that we use in the recruitment phase (advance letters, mixed-mode contact strategy), in the first waves as well, only to resort to a cheaper contact mode later, once panel members have developed a habit of responding to the different waves in a panel.

Saturday, September 22, 2012

why access panels cannot weight elections polls accurately

There are a lot of reasons why would not want to use acces panels for predicting electoral outcomes . These are well discussed in many places on- and offline. I'll shortly summarize them, before adding some thoughts to why access panels do so badly predicting election outcomes.

1. Access panels don't draw random samples, but rely on self-selected samples. A slightly better way to get panel respondents is a quota sample, but even these have problems, well discussed here, here and here for example. The bottom line is that access panel respondents are not ' normal' people, and so voting preferences of not-normal people are likely to be biased.
 
2. Because of these problems, survey managers use weighting. They correct their sample for known biases in the sample. If they know elderly people with low educations are underrepresented in an access panel, they weigh them up. I think this is bad practice. And it has been shown that weighting does not solve the problem,. and can sometimes make biases worse for general surveys. Here are some additional and specific problems, often neglected. In short, weighting only works if the weighting variables can predict the dependent variable to a great extent.



Weighting is usually done with socio-demographic variables. From political science research, we know that sociodemographics do a bad job of explaining voting behavior. Explained variances for regression models normally don't exceed 10%.
So, let me get down to the main point I would like to make in this post. A point which I have not seen discussed anywhere.

Panel survey managers have 'resolved' the weakness of their weighting models by including a variable that does predict voring behavior fairly well: past voting behavior. If one knows that past Social Democrat voters are underrepresented, one can weight on that variable. This is all very well, if one has good data of past voting behavior for all panel members. The panels currently do not. Their information is wrong in two ways:

1. Access panels will never have information for people who did not vote previously. These are mainly young people, or people who normally do not vote in elections. If these new voters vote like everyone else there is no problem, but new voters have very specific voting preferences.

2.  Reversely, access panels can not predict well who is not going to vote in current elections. If non-voters disproportionally voted for one party in the previous elections, this will lead to an overestimation of voters for that party.

I believe these two problems are larger than most people think. The first problem can predict why the PVV-vote was underestimated in 2006 and 2010. The PVV attracted many new voters in those elections. The second problem explains why the PVV-vote was overestimated in 2012.  Many people who voted PVV in the previous elections, stayed home this time.

So, panel survey managers who want a bit of free advice how to improve your polls. Try to get a clear view on the new voters, and the people unlikely to vote. That may be hard, especially because non-voters are not so interested in politics, and will therefore not sign up for online access panels voluntarily. But it is certainly not impossible.


Thursday, September 13, 2012

Dutch elections 2012 - poll results

The night after the election, one can conclude that all pollsters in the Netherlands did a bad job of predicting the election results. All polls were at least off by 20 seats (out of 150), and I expect the newspapers to make headlines of this in the next days. See the table below for the final predictions (before election day), the exit poll and final election results. The last row shows how much each poll was off (in the number of seats

Actually, I think the pollsters did pretty well this time. The only thing all of them mispredicted, was a large number of PVV voters moving to VVD, and a lot of SP voters moving to the PvdA, This movement was visible in the last polls leading up to the elections, but the pollsters either underestimated it, or a lot of people switched for the winner on election date.

So, I predicted Synovate would do best, but that did not turn out to be the case. Well, they share first place with Maurice de Hond, but are not clearly better than others. There are lots of blogs, articles and news items about Internet Panels these days. I spent some blogposts on that issue in 2009 myself. Although the largest reason why pollers generally do so badly is that they do not draw random samples, I think there are two more reasons why pollsters do badly. I plan to spend my next two blog posts on these topics, so stay tuned for more on the following issue.

Pollsters use statistical weighting to account for the unrepresentativity of their panel. They do this on sociodemographic characteristics and past voting behavior. I believe it is wrong to weight (in general), and specifically to do so on past voting behavior. I'll show you why in the next days.

No. of seats in partliament 2012
 Maurice de Hond (peil.nl)
Intomart/de stemming
 Synovate
 TNS-NIPO
Exit Poll (synovate)
Final results
VVD (right-liberal)
36
35
37
35
41
41
PVDA (social democrat)
36
34
36
34
40
38
SP (socialist)
20
22
21
21
15
15
PVV (anti-immigrant)
18
17
17
17
13
15
CDA (christian democrats)
12
12
13
12
13
13
D'66 (center liberals)
11
11
10
13
12
12
CU (christian union)
5
 7
5
6
4
5
SGP (reformed christians)
3
3
2
2
3
3
Groenlinks (green)
4
4
4
 4
4
4
PVDD (animal rights
3
2
3
 2
2
2
50plus (elderly)
2
3
 2
 4
3
2







Wrongly predicted
18
24
18
24
6