Monday, September 23, 2013

Publish your data

This morning, an official enquiry into the scientific conduct of professor Mart Bax concluded that he had committed large-scale scientific fraud over a period of 15 years. Mart Bax is a now-retired professor of political anthropology at the Free University Amsterdam. In 2012 a journalist first accused him of fraud, and this spring, the Volkskrant, one of the big newspapers in the Netherlands reported they were not able to find any of the informants Mart Bax had used in his studies.

An official enquiry followed. You can can the report here (in Dutch). In summary, Mart Bax most likely made up at least 64, mostly peer-reviewed articles and recycled his own articles using different titles in different journals. Although the investigation could not rule out that some studies were just done sloppily the overall picture from the report is one of overall scientific misconduct.

So, what to do about this? I have a clear opinion on this: Make your data available, and replicate other people's studies

What strikes me, is that it seems normal to some social scientists not to store interviews (whether on tape or anonymized in scripts) or publish datasets. It may be a little more difficult for qualitative researchers than quantitative researchers to do this. Back in the 1990s, when Mart Bax committed this fraud, it may have been really complicated to publish such transcripts online. Nowadays, it is dead easy however, and some, although not many journals offer this service. See for some good examples in the social sciences:

- The review of economical studies: will only publish articles that provide data, and where analyses can be replicated. This is the only example that I could find of a journal policy that really makes it easy to replicate research findings.
- All Springer journals provide the opportunity for supplementary materials (among them data). Is just an option.
- The journal of Personality and Social Psychology encourages providing data and analysis scripts in general (The American Psychological Association as w hole does this).

 If you know of any other journals with good replicability policies, please send a comment or e-mail (p.lugtig AT, so I can compile a more comprehensive list.
In the natural sciences, there are journals where datasets are peer-reviewed and published. As social scientists, we have a long way to go to tackle fraud, and generally become much more open about our data and analysis methods. Journals, professional associations, and individualresearchers should all be stricter on data accessibility, and replicability of studies.

Saturday, September 14, 2013

How to improve the social sciences

Social scientists (and psychology in particular) have in recent years had somethings of a bad press, both in- and outside academia. To give some examples:

- There is a sense among some people that social science provides little societal or economical value.
- Controversy over research findings within social science: for example the findings of Bem et al. about the existence of precognition, or the estimation of the number of casualties in Iraq war (2003-2007).
- Scientific fraud: in the Netherlands alone, we have had about five affairs of scientific fraud in the past few years. The biggest one being the Stapel-affair: a professor who dreamed up the data of about 50 high-profile scientific articles.

Now, I disagree with the fact that the social sciences do not contribute enough to society, but is hard to argue about this issue if people can argue that social scientists commit fraud, or are secretive about their methods and results. Also, I dislike the defensive attitude many social scientists take on fraud, opnness, and replicability. Instead of just being defensive, try do some something constructive with critique we receive as a field.

At this year's GESIS summer school, I gave a talk about the topic of survey errors and how to improve social science as a whole. I think it is really time to change the way we as social scientists do our work. We have to be more open about what we do and how we do it. Not only so we can strenghthen the position of the social sciences in general, but more importantly to make progress as a field. Many theories in the social sciences are founded on empirical research findings, that can either not be replicated, are based on wrong statistical analyses, or are based on fraudulated data. Dreaming up data is the worst example of fraud, but I count as fraud also selectively deleting those cases that don't support your theory, or presenting exploratory analyses as confirmatory. This last point especially is common throughout the social sciences (and in other disciplines as well).

Summaries of a survey on scientific fraud in medical faculties in the Flanders region of Belgium 

Here is my idea of how to start making things better:

1. Publish your data. Unless you work with medical or otherwise privacy-sensitive data, I see no reason why not all data should be published online at some point. Of course, data need to be cleaned after data collection, and researchers may keep data for themselves for a limited amount of time, if they fear others may steal their good research ideas. But once a paper is published, the data should become available. No excuses
2. Document your analyses. If you publish your data, why not publish your analyses scripts and logs of your analyses as well, so everyone can easily replicate your analyses?  We need more replication in the social sciences, and should make it more attractive for researchers to do this. Yes, that means that data analysis errors may be exposed by other researchers. But in my view, this is the only way we can progress as social sciences.

And as I believe in practice what you preach, I am putting all my data and analysis-scripts at this webpage (see publications). As I do not own all the data, I am still working out how to do this with some of my co-authors, but my goal is to really have all my data and scripts/syntaxes there, so everyone can replicate my analyses given they use the same software.

The slides for my presentation at GESIS are found here. There is also a video-recording of the lecture, that can be found here

Monday, September 9, 2013

Nonresponse Workshop 2013

One of the greatest challenges in survey research are declining response rates. Around the globe, it appears to become harder and harder to convince people to participate in surveys. As to why response rates are declining, researchers are unsure. A general worsening of the 'survey climate', due to increased time pressures on people in general, and direct marketing are usually blamed.

This year's Nonresponse workshop was held in London last week. This was the 24th edition, and all we talk about at the workshop is how to predict, prevent or adjust for nonreponse in panel surveys.

Even though we are all concerned about declining nonresponse rates, presenters at the nonresponse workshop have found throughout the years that nonresponse cannot be predicted using respondent characteristics. The explained variance of any model rarely exceeds 0.20. Because of this, we don't really know how to predict or adjust for nonresponse either. We fortunately also find that generally, the link between nonresponse rates and nonresponse bias is weak. In other words, high nonresponse rates are not per se biasing our substantive research findings.

At this year's nonresponse workshop presentations focused on two topics. At other survey methods conferences (ESRA, AAPOR) I see a similar trend:

1. Noncontacts: where refusals can usually be not predicted at all (explained variances lower than 0.10), noncontacts can to some extent. So, presentations focused on:
- increasing contact rates among 'difficult' groups
- Using paradata, and call record data to improve the prediction of contact times, and succesful contacts.
- Using responsive designs, where the contact strategies is changed, based on pre-defined (and often experimental) strategies for subgroups in your populations (adaptive designs), and paradata during fieldwork using decision-rules (responsive designs).
2. Efficiency: Responsive designs can be used to increase response rates or limit nonresponse bias. However, they can also be used to limit survey costs. If respondents can be contacted with fewer contact attempts, this saves money. Similarly, we can limit the amount of effort we put into groups of cases for which we already have a high response rate, and devote our resources to hard-to-get cases.

There are many interesting studies than can be done into both these areas. With time, I think we will see that succesful stratgies will be developed that limit noncontact rates, nonresponse and even nonresponse bias to some extent. Also, survey might become cheaper using responsive designs, especially if the surveys use Face-to-Face or telephone interviewing. At this year's workshop, there were no presentations on using a responsive design approach for converting soft refusals. But I can see the field moving in that direction too eventually.

Just one note of general disappointment with myself and our field remains after attending the workshop (and I've had this feeling before):

If we cannot predict nonresponse at all, and if we find that nonresponse generally has a weak effect on our survey estimates, what are we doing wrong? What are we not understanding? It feels, in philosophical terms, as if we survey methodologists are perhaps all using the wrong paradigm for studying and understanding the problem. Perhaps we need radically different ideas, and analytical models to study the problem of nonresponse. What these should be is perhaps anyone's guess. And if not anyone's, at least my guess.