Monday, November 4, 2013

Longitudinal interview outcome data reduction: Latent Class and Sequence analyses

Frauke Kreuter once commented on a presentation I gave that I should really be looking at sequence analysis for studying attrition in panel surveys. She had written an article on the topic with Ulrich Kohler (here) in 2009, and as of late there are more people exploring the technique (e.g. Mark Hanly at Bristol, and Gabi Durrant at Southampton).

I am working on a project on attrition in the British Household Panel, and linking attrition errors to measurement errors. Attrition data can be messy. Below, you see the response outcome sequences of every initial panel member in the British Household Panel Survey. This figure obscures the fact individual respondents may frequently switch states (e.g. interview - noncontact - interview - refusal - not issued).

Figure 1: relative sizes of final interview outcomes at 18 waves of BHPS of wave 1 respondents

Although descriptive visualizations like these are informative, sequence analysis becomes analytically interesting when you try to "do" something with the sequences of information. In my case, I want to group all sequence chains into "clusters" or "classes" of people who have a similar process of attrition. Stata and R TraMineR offer possibilities for doing this. Both packages enable you to match sequences by optimal matching, so that for every sequence from every person you get a distance measure to every other sequence (person). In turn, this (huge) dustance matrix can then be used to classify all the sequences of all respondents into clusters. R offers a nice way to handle huge data matrices, by using aggregation and weighting by the way. See the WeightedCluster library.

Below, you find the results of the sequence analysis. The nice thing is that I end up with 6 clusters that look the same as the Classes that I got out of a Latent Class Analysis over summer. So now I feel much more confident using this classification.

Figure 2: 6-cluster solution for sequence analysis on BHPS attrition patterns

The differences between sequence analysis and LCA are really minimal, and probably result from the fact that the Optimal Matching algorithm used in sequence analysis is more flexible (in that it would allow deletion, additions, substitutions etc to match), than Latent Class Analysis. But in practice, for my analyses, it doesn't matter what technique I use. My results are equivalent. Personally, I like Latent Class analysis more, because it offers the option of linking the Latent Classes of attrition to substantive research data in one model.

Attrition data bear a resemblence to contact data recorded in a telephone or face-to-face survey. Interviewers make call or interview attempts, that bear a lot of information about the survey proecess, and could improve fieldwork, and reduce nonresponse and costs. I am imagining a paper where one links contact data at every wave, and combines that with attrition analyses, in a series of linked-sequence data analysis. That way, you can learn how specific sequences of call and contact data, lead to specific sequences of interview outcomes at later stages of the panel survey. It can be done if you have a lot of time.

Monday, October 21, 2013

A great lecturer, and the contextuality of nonresponse

I love watching videos from Richard Feynman on Youtube. Apart from being entertaining, Feynman in the video below does explain quite subtly about what constitutes a good scientific theory, and what doesn't. He is right about the fact that good theories are precise theories.

Richard Feynman: fragment from a class on the Philosophy of science (source: Youtube)

The video also makes me jealous of natural scientists. In the social sciences, almost all processes and causal relationships are contextual, as opposed to the natural sciences. For example: in survey methods, nonresponse is one of the phenomena that is contextual. Nonresponse always occurs, but the predictors of nonresponse differ across countries, survey topics, time, survey mode, and subpopulations. In other words, that is what makes building a theory about nonresponse so difficult.

Thursday, October 17, 2013

My prayers on peer-reviewed datasets are instantly answered

Three weeks ago, I wrote about the fact that I think that it would be great if we could have a journal on peer-reviewed datasets (along with data being accessible).

It seems I am not alone thinking this. Jelte Wicherts, a psychologist/statistician atTilburg University has just started the Journal of Open Psychology data.

Jelte writes:

"The Journal of Open Psychology Data (JOPD) features peer reviewed data papers describing psychology datasets with high reuse potential. Data papers may describe data from unpublished work, including replication research, or from papers published previously in a traditional journal. We are working with a number of specialist and institutional data repositories to ensure that the associated data are professionally archived, preserved, and openly available. Equally importantly, the data and the papers are citable, and reuse is tracked.

Monday, October 14, 2013

Imagine we have great covariates for correcting for unit nonresponse...

I am continuing on the recent article and commentaries on weighting to correct for unit nonresponse by Michael Brick, as published in the recent issue of the Journal of Official Statistics (here).

The article by no means is all about whether one should impute or weight. I am just picking out one issue that got me thinking. Michael Brick rightly says that in order to correct succesfully for unit nonresponse using covariates, we want the covariates to do two things:

1. They should explain missingness.
2. They should highly correlate with our variable of interest.

In other words, these are the two assumptions for a  Missing At Random process of missing data.

The variables (covariates) we currently use for nonresponse adjustments do neither. Gender, age, ethnicity, region, (and if we're lucky) education, household composition and house characterics do not explain missingness, nor our variable of interest. Would it ever be conceivable to obtain covariates that do this? What are the candidates?

1. covariates (X) that explain missingness (R):
Paradata are currently our best bet. Those may be interviewer observations or call data during fieldwork (note the absence of sample level paradata for self-administered surveys - here lies a task for us). Paradata don't explain missingness very well at the moment, but I think everyone in survey research agrees we can try to collect more.
Another set of candidates are variables that we obtain by enriching sampling frames. We can use marketing data, social networks, or census data to get more information on our sampling units.

2. covariates (X) that explain our variable of interest (Y):
Even if we find covariates that explain missingness, we also want those covariates to be highly correlated to our variable of interest. It is very unlikely that a fixed set of for example paradata variables can ever achieve that. Enriched frame data may be more promising, but is unlikely that this will generally work. I think it is a huge problem that our nonresponse adjustment variables (X) are not related to Y, and one that is not likely to ever be resolved for cross-sectional surveys.

But. In longitudinal surveys, this is an entirely different matter. Because we usually ask the same variables over time, we can use variables from earlier occasions to predict values that are missing at later waves. So, there, we have great covariates that explain our variable of interest. We can use those as long as MAR holds. If change in the dependent variable is associated with attrition, MAR does not hold. Strangely, I know very few studies that study whether attrition is related to change in the dependent variable. Usually, attrition studies focus on covariates measured before attrition, to then explain attrition. They do not focus on change in the dependent variable.

Covariate adjustment for nonresponse in cross-sectional and longitudinal surveys

(follow-up 28 October 2013): When adjustment variables are strongly linked to dependent variables, but not to nonresponse, variances tend to be increased (See Little and Vartivarian). So, in longitudinal surveys, the weak link between X and R should really be of medium strength as well, if adjustment is to be successful.

I once thought that because we have so much more information in longitudinal surveys, we could use the lessons that we learn from attrition analyses to improve nonresponse adjustments in cross-sectional surveys. In a forthcoming book chapter, I found that the correlates of attrition are however very different from the correlates of nonresponse in wave 1. So in my view, the best we can do in cross-sectional surveys is to focus on explaining missingness, and then hope for the best for the prediction of our variables of interest.

Sunday, October 6, 2013

To weight or to impute for unit nonresponse?

This week, I have been reading the most recent issue of the Journal of Official Statistics, a journal that has been open access since the 1980s.  In this issue is a critical review article of weighting procedures authored by Michael Brick with commentaries by Olena Kaminska (here), Philipp Kott (here), Roderick Little (here), Geert Loosveldt (here), and a rejoinder (here).

I found this article a great read, and to be full of ideas related to unit nonresponse. The article reviews approaches to weighting: either to the sample or the population, by poststratification and with different statistical techniques. But it discusses much more, and I recommend reading it.

One of the issues that is discussed in the article, but much more extensively in a commentary by Roderick Little, is the question whether we should use weighting or imputations to adjust for unit nonresponse in surveys. Over the years, I have switched allegiances to favouring weighting or imputations in certain missing data situations many times, and I am still not always certain on what is best to do. Weighting is generally favoured for cross-sectional surveys, because we understand how it works. Imputations are generally favoured when we have strong correlates for missingness and our variable(s) of interest, such as in longitudinal surveys. Here are some plusses and minuses for both weighting and imputations.

Weighting is design based. Based on information that is available for the population or whole sample (including nonrespondents), respondent data are weighted in such a way that the survey data reflect the sample/population again.

+ The statistical properties of all design-based weighting procedures are well-known.
+ Weighting works with complex sampling designs (at least theoretically).
+ We need relatively little information on nonrespondents to be able to use weighting procedures. There is however a big BUT...
- Weighting models mainly use socio-demographic data, because that is the kind of information we can add to our sampling frame. These variables are never highly correlated with our variable of interest, nor missingness due to nonresponse, so weighting is not very effective. That is, weighting theoretically works nicely, but in practice, it doesn't ameliorate the missing data problem we have because of unit nonresponse much.

Imputations are model based. Based on available information for respondents and nonrespondents, a prediction model is built for a variable which has missing information. The model can take an infinite number of shapes, depending on whether imputation is stochastic, how variables are related within the model, and what variables are being used. Based on this model, one or multiple values are imputed for every missing value on every variable for every case. The crucial difference is that weighting uses the same variables for correcting the entire dataset, whereas imputation models differ for every variable that is to be imputed.

+ Imputation models are flexible. This means that the imputation model can be optimized in such a way that it strongly predicts both the dependent variable to be imputed, and the missingness process.

- In the case of unit nonresponse, we often have limited data on nonrespondents. So, although a model-based approach may have advantages over design-based aproaches in terms of its ability to predict our variable(s) of interest, this depends on the quality of the covariates we use.

This then brings me, and the authors of the various papers in JoS back to the basic problem: we don't understand the process on nonresponse in surveys. Next time, more on imputations and weighting for longitudinal surveys. And more on design vs. model based approaches in survey research.

p.s. This all assumes simple random sampling. If complex sampling designs are used, weighting is until now I think the best way to start dealing with nonresponse. I am unaware of imputation methods that can deal with complex sampling (other than straightforward multilevel structures). 

Monday, September 23, 2013

Publish your data

This morning, an official enquiry into the scientific conduct of professor Mart Bax concluded that he had committed large-scale scientific fraud over a period of 15 years. Mart Bax is a now-retired professor of political anthropology at the Free University Amsterdam. In 2012 a journalist first accused him of fraud, and this spring, the Volkskrant, one of the big newspapers in the Netherlands reported they were not able to find any of the informants Mart Bax had used in his studies.

An official enquiry followed. You can can the report here (in Dutch). In summary, Mart Bax most likely made up at least 64, mostly peer-reviewed articles and recycled his own articles using different titles in different journals. Although the investigation could not rule out that some studies were just done sloppily the overall picture from the report is one of overall scientific misconduct.

So, what to do about this? I have a clear opinion on this: Make your data available, and replicate other people's studies

What strikes me, is that it seems normal to some social scientists not to store interviews (whether on tape or anonymized in scripts) or publish datasets. It may be a little more difficult for qualitative researchers than quantitative researchers to do this. Back in the 1990s, when Mart Bax committed this fraud, it may have been really complicated to publish such transcripts online. Nowadays, it is dead easy however, and some, although not many journals offer this service. See for some good examples in the social sciences:

- The review of economical studies: will only publish articles that provide data, and where analyses can be replicated. This is the only example that I could find of a journal policy that really makes it easy to replicate research findings.
- All Springer journals provide the opportunity for supplementary materials (among them data). Is just an option.
- The journal of Personality and Social Psychology encourages providing data and analysis scripts in general (The American Psychological Association as w hole does this).

 If you know of any other journals with good replicability policies, please send a comment or e-mail (p.lugtig AT, so I can compile a more comprehensive list.
In the natural sciences, there are journals where datasets are peer-reviewed and published. As social scientists, we have a long way to go to tackle fraud, and generally become much more open about our data and analysis methods. Journals, professional associations, and individualresearchers should all be stricter on data accessibility, and replicability of studies.

Saturday, September 14, 2013

How to improve the social sciences

Social scientists (and psychology in particular) have in recent years had somethings of a bad press, both in- and outside academia. To give some examples:

- There is a sense among some people that social science provides little societal or economical value.
- Controversy over research findings within social science: for example the findings of Bem et al. about the existence of precognition, or the estimation of the number of casualties in Iraq war (2003-2007).
- Scientific fraud: in the Netherlands alone, we have had about five affairs of scientific fraud in the past few years. The biggest one being the Stapel-affair: a professor who dreamed up the data of about 50 high-profile scientific articles.

Now, I disagree with the fact that the social sciences do not contribute enough to society, but is hard to argue about this issue if people can argue that social scientists commit fraud, or are secretive about their methods and results. Also, I dislike the defensive attitude many social scientists take on fraud, opnness, and replicability. Instead of just being defensive, try do some something constructive with critique we receive as a field.

At this year's GESIS summer school, I gave a talk about the topic of survey errors and how to improve social science as a whole. I think it is really time to change the way we as social scientists do our work. We have to be more open about what we do and how we do it. Not only so we can strenghthen the position of the social sciences in general, but more importantly to make progress as a field. Many theories in the social sciences are founded on empirical research findings, that can either not be replicated, are based on wrong statistical analyses, or are based on fraudulated data. Dreaming up data is the worst example of fraud, but I count as fraud also selectively deleting those cases that don't support your theory, or presenting exploratory analyses as confirmatory. This last point especially is common throughout the social sciences (and in other disciplines as well).

Summaries of a survey on scientific fraud in medical faculties in the Flanders region of Belgium 

Here is my idea of how to start making things better:

1. Publish your data. Unless you work with medical or otherwise privacy-sensitive data, I see no reason why not all data should be published online at some point. Of course, data need to be cleaned after data collection, and researchers may keep data for themselves for a limited amount of time, if they fear others may steal their good research ideas. But once a paper is published, the data should become available. No excuses
2. Document your analyses. If you publish your data, why not publish your analyses scripts and logs of your analyses as well, so everyone can easily replicate your analyses?  We need more replication in the social sciences, and should make it more attractive for researchers to do this. Yes, that means that data analysis errors may be exposed by other researchers. But in my view, this is the only way we can progress as social sciences.

And as I believe in practice what you preach, I am putting all my data and analysis-scripts at this webpage (see publications). As I do not own all the data, I am still working out how to do this with some of my co-authors, but my goal is to really have all my data and scripts/syntaxes there, so everyone can replicate my analyses given they use the same software.

The slides for my presentation at GESIS are found here. There is also a video-recording of the lecture, that can be found here

Monday, September 9, 2013

Nonresponse Workshop 2013

One of the greatest challenges in survey research are declining response rates. Around the globe, it appears to become harder and harder to convince people to participate in surveys. As to why response rates are declining, researchers are unsure. A general worsening of the 'survey climate', due to increased time pressures on people in general, and direct marketing are usually blamed.

This year's Nonresponse workshop was held in London last week. This was the 24th edition, and all we talk about at the workshop is how to predict, prevent or adjust for nonreponse in panel surveys.

Even though we are all concerned about declining nonresponse rates, presenters at the nonresponse workshop have found throughout the years that nonresponse cannot be predicted using respondent characteristics. The explained variance of any model rarely exceeds 0.20. Because of this, we don't really know how to predict or adjust for nonresponse either. We fortunately also find that generally, the link between nonresponse rates and nonresponse bias is weak. In other words, high nonresponse rates are not per se biasing our substantive research findings.

At this year's nonresponse workshop presentations focused on two topics. At other survey methods conferences (ESRA, AAPOR) I see a similar trend:

1. Noncontacts: where refusals can usually be not predicted at all (explained variances lower than 0.10), noncontacts can to some extent. So, presentations focused on:
- increasing contact rates among 'difficult' groups
- Using paradata, and call record data to improve the prediction of contact times, and succesful contacts.
- Using responsive designs, where the contact strategies is changed, based on pre-defined (and often experimental) strategies for subgroups in your populations (adaptive designs), and paradata during fieldwork using decision-rules (responsive designs).
2. Efficiency: Responsive designs can be used to increase response rates or limit nonresponse bias. However, they can also be used to limit survey costs. If respondents can be contacted with fewer contact attempts, this saves money. Similarly, we can limit the amount of effort we put into groups of cases for which we already have a high response rate, and devote our resources to hard-to-get cases.

There are many interesting studies than can be done into both these areas. With time, I think we will see that succesful stratgies will be developed that limit noncontact rates, nonresponse and even nonresponse bias to some extent. Also, survey might become cheaper using responsive designs, especially if the surveys use Face-to-Face or telephone interviewing. At this year's workshop, there were no presentations on using a responsive design approach for converting soft refusals. But I can see the field moving in that direction too eventually.

Just one note of general disappointment with myself and our field remains after attending the workshop (and I've had this feeling before):

If we cannot predict nonresponse at all, and if we find that nonresponse generally has a weak effect on our survey estimates, what are we doing wrong? What are we not understanding? It feels, in philosophical terms, as if we survey methodologists are perhaps all using the wrong paradigm for studying and understanding the problem. Perhaps we need radically different ideas, and analytical models to study the problem of nonresponse. What these should be is perhaps anyone's guess. And if not anyone's, at least my guess.

Saturday, August 3, 2013

Dependent Interviewing and the risk of correlated measurement errors

Longitudinal surveys ask the same people the same questions over time. So questionnaires tend to be rather boring for respondents after a while. "Are you asking me this again, you asked that last year as well!" is what many respondents probably think during an interview. As methodologists who manage panel surveys, we know this process may be rather boring, but in order to document change over time, we just need to ask respondents the same questions over and over.

Some measures of change over time would become biased if we just repeat questions year-on-year. For example, we know that if we ask respondents twice about their occupation, less than half of all of them have the same occupational codes over time. We know from other statistics (e.g. tax returns), that that is not true. Most people stay in the same occupation over time. Now, you may think, dear reader, that that is probably due to the fact that occupation is rather difficult to measure and code in general, and you are right. Unreliable question will lead to a lot of spurious change over time.

Dependent Interviewing helps to make codes consistent over time and reduce such spurious change. The idea is that, instead of coding occupation independently year-on-year, you ask respondents in year 2 the question "last year, you said you were a bankteller, is that still the case?". There are many different variants to ask this Dependent Interviewing question, and the exact wording is important for the outcomes. Especially, because we do not want respondent to say "yes" too easily to questions we ask.

"Last year, you told me you told me you worked as a bankteller, is that still the case?"

Recently, a paper I wrote on the effects of various forms of Dependent Interviewing came out in Field Methods. It was actually the first paper I wrote for my Ph.D, and I started work on it in 2006. So, it has been quite a journey to get this story on paper and get it published. I am very happy to see it on paper now. We did an experiment, where we tried out different DI-designs in a four-wave panel study, to study effects of data quality of each different DI-design. Specifically, we looked at whether respondents might falsely confirm data from the previous year that we knew contained measurement error. The bottom line of the study is that when Dependent Interviewing is applied to income amount questions over time, it does improve data quality, and we don't need to worry so much about respondents wrongly agreeing to pre-loaded data from the  previous year. Read the full paper here.

Thursday, July 4, 2013

measurement and nonresponse error in panel surveys

I am spending time at the Institute for Social and Economic Research in Colchester, UK where I will work on a research project that investigates whether there is a tradeoff between nonresponse and measurement errors in panel surveys.

Survey methodologists have long believed that multiple survey errors have a common cause. For example, when a respondent is less motivated, this may result in nonresponse (in a panel study attrition), or in reduced cognitive effort during the interview, which in turn leads to measurement errors. Lower cognitive abilities and language problems might be other examples of common caused that lead to either nonresponse or measurement error. Understanding these common error sources is important to know whether our efforts to reduce 1 survey error source are not offset by an increase in another one. It follows from the idea that good survey design minimize Total Survey Error

Studying the trade-off has proven to be very difficult. This is because nonrespondents are by definition not observed. So, we never know how nonrespondents would answer questions, and how much measurement error is included in those answers. We can only observe measurement errors for respondents, but can not compare these to the potential measurement error of nonrespondents.

Hypothetical continuum of timing of survey response

To overcome this problem, most methodologists have compared 'early' respondents (people who respond very quickly in the fieldwork period) to 'late' respondents (those who only participate after being reminded for example). The idea behind this, is that the probability of response is:

a) a linear continuum from very early response on the one extreme, and nonresponse on the other.
b) that hypothetically, nonrespondents could be converted into respondents if extreme amounts of efforts are used to do so (Voogt 2005 showed in a small-scale study in the Dutch locality of Zaandam that this is actually possible)

So, the idea in summary is that late respondents can serve as a proxy for information about nonrespondents. However, that assumption is not likely to be true in general, if ever.

In my project, I will try to overcome this problem, that we never have measurement error estimates for nonrespondents. I use longitudinal data and Structural Equation Modeling techniques to estimate measurement errors for nonrespondents in the British Household Panel Study, compare them to respondents, and link them to potential common causes of both type of errors. See this presentation for more details on this project

Thursday, May 23, 2013

AAPOR 2013

The AAPOR conference last week gave an overview of what survey methodologists worry about. There were relatively few people from Europe this year, and I found that the issues methodologists worry about are sometimes different in Europe and the USA. At the upcoming ESRA conference for example there are more than 10 sessions on the topic of mixing survey modes. At AAPOR, mixing modes was definitely not 'hot'.

With 8 parallel sessions at most times, I have only seen bits and pieces of all the things that went on. So the list below is just my take on what's innovative and hot in survey research in 2013. RTI composed a summary of all tweets for a different take on what mattered at AAPOR this year

1. Probability based surveys vs. non-probability surveys. AAPOR published a report on this topic during the conference, written by survey research heavy-weights. This is recommended reading for everyone interested in polls. The conclusion that non-probability polls should not be used if one wants to have a relatively precide estimate for the general population is not surprising. It can not be re-iterated often enough. Other presentations on this topic features John Krosnick showing empirically that only probability-based surveys give consistent estimates. See a summary of the report here

2. The 2012 presidential elections. See a good post by Marc Blumenthal on this topic. Many sessions on likely voter models, shifting demographics in the U.S. and the rise of the cell-phone only generation.

3. Responsive designs. The idea of responsive (or adaptive) survey designs is that response rates are balanced across important sub-groups of the population. E.g. in a survey on attitudes towards immigrants, it is important to get equal response rates for hispanics, blacks and whites, when you believe that attitudes towards immigrants differ among ethnic sub-groups.
During fieldwork, response rates can be monitored, and when response rates for hispanics stay low, resources can be shifted towards targeting hispanics, by either contacting them more often, or switching them to a more expensive contact mode. If this is succesful, the amount of nonresponse bias in a survey should decrease.
The idea of responsive designs has been around for about 15 years. I had until now not seen many successful applications however. A panel session by the U.S. Census bureau did show that response design can work, but it requires survey organisations to redesign their entire fieldwork operations. For more information on this topic, see the excellent blog by James Wagner

Monday, April 1, 2013

Open access: things we can learn from the natural sciences

I was  pointed to an interview with two Harvard professors, one of them my (former) idol Gary King, talking about the need for open access publishing. I'm re-posting it here as a reminder to myself, and to anyone from the social sciences (or survey methods), that we should be more open about what we do and write. Too often, papers take years to appear in print-journals, and even then, these articles probably don't make it to those people without subscriptions to all the main publishers.

One practical question to anyone reading this: the natural sciences have ArXiv: a system where everyone published work-in-progress. Does anyone know of initiatives in this direction in the social sciences?

Thursday, March 7, 2013

Interested in new mixed mode research project?

Mixed-mode research is still a hot topic among survey methodologists. At least at about every meeting I attend (some selection bias is likely here). Although we know a lot from experiments in the last decade, there is also a lot we don't know. For example, what designs reduce total survey error most? What is the optimal mix of survey modes when data quality and survey costs are both important? And, how can we compare mixed-mode studies across time, or countries, when the proportions of mode assignments changes over time or vary between countries?

Together with some researchers at National Statistical Institutes I am trying to form a European consortium to set up a programme for doing comparative mixed-mode research. One of the goals of this consortium would be to apply for funding at the next wave of EU funding (horizon 2020). We are however still looking for researchers in market research, official statistics and universities. And especially from countries in Northern, Central and Southern Europe. Want to know more? Write me at p.lugtig AT

Saturday, February 23, 2013

Mixed mode surveys: where will be in 5 years from now?

Some colleagues in the United Kingdom have started a half-year initiative to discuss the possibilities of conducting web surveys among the general population. Their website can be found here 

One aspect of their discussions focused on whether any web survey among the population should be complemented with another, secondary survey mode. This would for example enable those without Internet access to participate.  Obviously, this means mixing survey modes.

Using two different survey modes to collect survey data, risks introducing extra survey error. Methodologists (me inclusive) have worked hard on getting a grip on the existence of differences in measurement effects between different modes. In order to study these properly, one should first make sure that the sub samples that are interviewed in different survey modes, do not differ just because of differences in selection effects between the two samples. I have written some earlier posts on this issue, see some of the labels in the word-cloud on the right.

I have composed a short presentation on ways in which differences in measurement effects in mixed-mode surveys can be studied. The full presentation is here. Comments are very welcome.

In going over the literature, two things stood out, that I never realised:

1. There are few well-conducted studies on measurement effects in mixed-mode surveys. Those that exist show that there often are difference in means, and sometimes in variances of survey statistics. Yet no one (and I'd love to be corrected here), has looked at the effect on covariances. That is, do relations between the key variables in a study change, just because of the mode of data collection?  There may be an analogy to nonresponse studies, where we often find bias on means and variances, but much smaller biases for covariances. In this picture, this reflects the relation between x1 and y1 in two different survey modes. Is that relation different because of mode effects? Probably not, but we need more research on this.

2. What to do about mode effects? We are perhaps not ready for answering this question, looking at how little we know exactly about how measurement differences between modes affect survey statistics. But we should start thinking in general about this question. Can we correct for differences between modes. Should we want to do that? It would create a huge extra burden on survey researchers to study mode differences in all mixed-mode surveys, and designing correction methods for them. Could it be that in five years time, we have concluded that it is probably best to try to keep mode effects as small as possible and not worry about the rest?