kirstyevidence

Musings on research, international development and other stuff


4 Comments

Experimental methodologies… and baby pandas

Another week, another blog pointing out that RCTs are not the ‘gold standard’ of evidence despite the fact that NOBODY is saying they are. To be fair to the blogger, he is simply summarising a paper written by Angus Deaton – a man who is a bit of an enigma to me. I have heard him speak and been blown away by how thoughtful, insightful and challenging he is – until he comes to the topic of RCTs when he seems to become strawmantastic. Anyway, I’ve written about misconceptions about rcts so many times in the past that I am sure you are bored of hearing me – in fact I am even bored of hearing myself drone on about it. So, in lieu of another post on this matter, I present to you links to previous posts (here, here and here)… and a picture I have drawn for you of a baby panda. Enjoy.

baby panda


1 Comment

Guest post on Pritchett Sandefur paper

Readers, I am delighted to introduce my first ever guest post. It is from my colleague Max – who can be found lurking on twitter as @maximegasteen – and it concerns the recent Pritchett/Sandefur paper. Enjoy! And do let us know your thoughts on the paper in the comments.

Take That Randomistas: You’re Totally Oversimplifying Things… (so f(x)=a_0+∑_(n=1)^∞(a_n  cosnπx/L+b_n  sinnπx/L)…)

Internal validity is great - but it's not everything! (Find more fab evaluation cartoons on freshspectrum.com)

The quest for internal validity can sometimes go too far…
(Find more fab evaluation cartoons on freshspectrum.com)

Development folk are always talking about “what works”. It’s usually around a research proposal saying “there are no silver bullets in this complex area” and then a few paragraphs later ending with a strong call “but we need to know what works”. It’s an attractive and intuitive rhetorical device. I mean, who could be against finding out ‘what works’? Surely no-one* wants to invest in something that doesn’t work?-.

Of course, like all rhetorical devices, “what works” is an over-simplification. But a new paper by Lant Pritchett and Justin Sandefur, Context Matters for Size, argues that this rhetorical device is not just simplistic, but actually dangerous for sensible policy making in development. The crux of the argument is that the primacy of methods for neat attribution of impact in development research and donors’ giddy-eyed enthusiasm when an RCT is dangled in front of their eyes leads to some potentially bad decisions.

Pritchett and Sandefur highlight cases where, on the basis of some very rigorous but limited evidence, influential researchers have pushed hard for the global scale-up of ‘proven’ interventions. The problem with this is that while RCTs can have very strong internal validity (i.e. they are good at demonstrating that a given factor leads to a given outcome) their external validity (i.e. the extent to which their findings can be generalised) is oftentimes open to question. Extrapolating from one very different context, often at small scale, to another context can be very misleading. They go on to use several examples from education to show that estimates using less rigorous methods, but in the local context are a better guide to the true impact of an intervention than a rigorous study from a different context.

All in all, a sensible argument. But that is kind of what bothers me. I feel like Pritchett and Sandefur have committed the opposite rhetorical sin to the “what works” brigade – making something more complicated than it needs to be. Sure, it’s helpful to counterbalance some of the (rather successful) self-promotion of the more hard-line randomistas’ favourite experiments, but I think this article swings too far in the opposite direction.

I think Pritchett and Sandefur do a slight disservice to people who support evidence-informed development (full disclosure: I am one of them) thinking they would blindly apply the results of a beautiful study from across the world in the context in which they work. At the same time (and here I will enter into the doing a disservice to the people working in development territory) I would love to be fighting my colleagues on the frontline who are trying to ignore good quality evidence from the local context in favour of excellent quality evidence from elsewhere. But in my experience I’ve faced the opposite challenge, where people designing programmes are putting more emphasis on dreadful local evidence to make incredible claims about the potential effectiveness of their programme (“we asked 25 people after the project if they thought things were better and 77.56% said it had improved by 82.3%” – the consultants masquerading as researchers who wrote this know who they are).

My bottom line on the paper? It’s a good read from some of the best thinkers on development. But it’s a bit like watching a series of The Killing – lots of detail, a healthy dose of false leads/strawmen but afterwards you’re left feeling a little bit bewildered – did I have to go through all that to find out not to trust the creepy guy who works at the removal company/MIT?

Having said that, it’s useful to always be reminded that the important question isn’t “does it work (somewhere)” but “did it work over there and would it work over here”.  I’d love to claim credit for this phrase, but sadly someone wrote a whole (very good) book about it.


*With the possible exception of Lyle Lanley who convinced everyone with a fancy song and dance routine to build a useless monorail in the Simpsons



21 Comments

Chapter 2…in which kirstyevidence meets a randomista!

randomistaOk, I concede… They do exist. I know that I have previously suggested that they are fictional but last week, at a meeting of development practitioners and policy makers, I met a real live randomista who believed that randomised controlled trials were the best form of evidence in all cases and that there were no reasons why you should not always do them!

So – they exist and they are a bit bonkers! But what was even more striking to me about that particular meeting was how out of proportion people’s fear of RCTs seems to be compared to the very small number of people who think like my new randomista friend.

In fact, I am starting to get the impression that being anti-RCT is a bit of a badge of honour for those in the development field. A number of times recently, I have heard quite influential development wonks come out with statements about RCTs which would be comical if they weren’t so dangerous. To be clear, while RCTs are no silver bullet, they have generated knowledge which has saved millions of lives. So why on earth does the development sector hate them so much??

accreditationIt seems to me that it’s fueled by genuine fear that ignorant technocrats are going to shut down important development interventions simply because they do not produce rapid, measurable outcomes. This is a legitimate fear – but RCTs are not the culprit. RCTs are simply a tool not a political ideology.

This fear has generated a number of myths about RCTs which continue to circulate around in blogs and conference talks and don’t seem to die, no matter how many times they are shown to be false. Given their tenacity, I am fairly sure that any attempt I make to disprove them will make little difference – but it’s been a while since I have had a blog-based argument about experimental methodologies, so I think I will give it another go…

So, here are a few myths about RCTs…

MYTH 1: RCTs are useful for measuring things which are easy to count or measure objectively. This is why they are useful in medicine. But they are not useful for measuring ‘softer’ things like changes in attitudes/behaviour/perceptions, particularly when there might be a power imbalance between the investigator and the subject.

This is just not true. Many things which RCTs have been used to measure in medicine are perception-based. For example, there is not an objective biochemical marker for how bad a headache is but RCTs can be used to check if the perception of the headache is improved more when you are given an actual medicine compared to when you are given a placebo. The fact that improvement in headaches is subjective – and responsive to the power dynamics at play between doctors and patients – is precisely why any response to a pill needs to be compared with the response to a placebo so that you can get a measure of what the ‘real’ effect is. Changes in perception are particularly affected by the placebo effect, the desire of participants to please the investigator and a host of other biases which is why RCTs are particularly useful in these cases.

MYTH 2: RCTs=quantitative All other research=qualitative.

This is a common misunderstanding – in fact there is a great deal of quantitative research which is not experimental. RCTs are the most well-known variety of experimental approaches – all this means is that they set up a randomly assigned control group and compare the response in the group which get their actual treatment to the response in the control group. You can also get quasi-experimental approaches – this simply means is that there is a control group but it is not randomly assigned. Any other designs are called observational. These do include qualitative approaches but they also include quantitative research – for example econometric analysis.

MYTH 3: People who support evidence-informed policy believe that RCTs are the gold standard in research.

NOT TRUE (well ok, it may be true for my new randomista friend but it is certainly not a common belief in the circles I move in!). It is true that if you want to find out IF an intervention works, the most rigorous way to find out is to use an experimental approach. This does not mean that you always need to use an experimental approach – sometimes it would be absurd (hat tip to @knezovjb for that link) since there is no other plausable reason for the outcome and sometimes it is not practical. Observational research approaches are equally important but they are used to find the answers to other questions. For example: How does a particular intervention work or indeed why does it not? What is the distribution of a certain condition in a population? What is the economic and political situation of a given environment? etc etc. Observational approaches, such as before/after comparisons, are not the best way to check if something works simply because humans are so susceptible to bias – you may well find lots of people report that the intervention has benefitted them when there is actually no real affect of the intervention beyond placebo effects/desire to please the investigators.

MYTH 4: Development donors only want to fund RCTs.

I often hear people back up their belief in myth 3 by saying that it is clear that donors mainly believe in RCTs based on the fact that they invest so much money in them. This is just not true! I work at DFID and can say with certainty that the majority of research it funds does NOT use experimental approaches. All the data on what is funded by DFID and many other donors is freely available so if you don’t believe me, look it up (at some point when I get the time, I would like to take on a summer student to do a project looking at the data…). Similarly if you look at the evidence which is used in DFID business cases (again all freely available online) the majority is NOT experimental evidence. It is true that there are some bodies which are set up to fund experimental approaches but just as the fact that the Wellcome Trust only funds medical research does not mean that it thinks that agricultural research is less important, the existence of funders of experimental approaches does not in itself mean that there is a grand conspiracy to not fund other research. A variation on this myth is when people have had funding requests for observational research turned down by a development funder with the feedback that the approach lacked rigour. This is sometimes interpreted as meaning that the donors only like experimental approach – but this is not true. We desperately need good observational research – but the key word is good. That means being explicit about your methodology, surfacing and discussing potential biases, exploring alternative potential theories of change, considering if what you are measuring really allows you to answer the question you set out etc etc. See here for some great work on improving the rigour of qualitative approaches to impact assessment.

MYTH 5: RCTs are invariably and uniquely unethical.

It has been suggested that RCTs are unethical since they require that the control group is not given the ‘treatment’ intervention (or at least not at the same time as the treatment group). I think this argument is fairly weak since, whenever an intervention is rolled out, there will be people who get it and those who don’t. It has also been argued that it is unethical to expect participants who are not receiving an intervention to give up their time to contribute to someone’s research when they are not getting any direct benefit in return. I do think this is a valid point that needs to be explored – but this problem is in no way unique to RCTs. In fact, most observational research methods rely on people contributing ‘data’ without getting any direct benefit. Any project that is gathering information from vulnerable projects needs to consider these issues carefully and build an appropriate public engagement strategy.

.

So, I do think it is really important to have discussions on the value of different types of evidence in different contexts and in fact I am pretty much in agreement with a lot of the underlying concerns that the anti-RCT lobby have: I do get worried that a push to demonstrate results can lead donors to focus more on ‘technological fixes’ to problems instead of doing the ‘softer’ research to understand contexts and explore the reasons why the many existing ‘fixes’ have not achieved the impacts we might have hoped for. But I get frustrated that the debate on this subject tends to become overly polarised and is often based more on rhetoric than facts. I strongly agree with this blog which suggests that we should try to understand each other a bit better and have a more constructive discussion on this topic.


14 Comments

Fighting the RCT bogeyman

Sometimes I have the feeling that development experts want to paint the debate about randomised controlled trials as more polarised than it really is. They seem to think they are fighting against a maniacal, narrow-minded, statistics-obsessed pro-RCT bogeyman who they fear is about to take over all development funding. It may seem that they are committing the classic logical fallacy of painting the opposing argument as ludicrous in order to strengthen their point – but having had many discussions on this topic I believe that many people really do believe in this bogeyman. So, here is my attempt to kill him!

I think RCTs can give some important information on some questions which can help us to make some decisions in the international development sector. BUT this does not mean that I think that RCTs are the only form of evidence worth considering or that they are the best method in all cases – and I am not sure that anyone does think this.

I get frustrated that every time I mention something about RCTs people seem to respond by arguing with things that I have not actually said! For example I often get people telling me that RCTs don’t necessarily tell us about individual responses (I agree), that many interventions cannot practically be evaluated by RCTs  (I agree), that the results of RCTs may not be transferable to different contexts (I agree), that policy decisions are made on other factors than just evidence of ‘what works’ (I agree), that scientists often get things wrong (I agree -in fact I even have a blog post on it!) etc etc

.

What bothers me is that by focussing on a debate that doesn’t in truth exist, we are missing the opportunity to have much more useful and interesting discussions. For example, I would love it if more people thought about more innovative ways of integrating RCTs (or quasi-experimental approaches) with qualitative research so that both arms add value to the other. A valiant attempt is described here but as you can see, the poor social scientists were frustrated by the ‘hard’ scientists’ unwillingness to change their protocol. I wonder if qualitative research could be integrated more easily into adaptive trial designs which are designed so that protocol can be changed as research is gathered?

Another discussion that I would love to hear more on is how do we measure impacts such as behaviour/attitudes/capacity in a more objective way (a discussion which is validwhether one is using a RCT approach or not)? I feel that too much evaluation in international development relies on self-reporting of these things – for example measuring whether people report an increase in ca

pacity following a training event. When I worked at INASP we did some work to compare self-reported ability (of policy makers) to use research with actual ability as measured using a diagnostic test (sorry – this work is still ongoing but I hope it will be published eventually). Similarly, an excellent piece of work here looked at researchers’ reported barriers to using journals with their actual skills in using them while this report compared the work of the Ugandan parliament in relation to science and technology with the reported competence of MPs. In all cases there was little correlation. This makes me think that we need to be more creative in measuring actual impacts -rather than relying so much on self-reporting.

These are just a couple of the interesting questions that I think we could be exploring – but I am sure if people could stop spending their time fighting the RCT bogeyman, they could come up with a lot more interesting, valid and important questions to discuss.