kirstyevidence

Musings on research, international development and other stuff

Chapter 2…in which kirstyevidence meets a randomista!

21 Comments

randomistaOk, I concede… They do exist. I know that I have previously suggested that they are fictional but last week, at a meeting of development practitioners and policy makers, I met a real live randomista who believed that randomised controlled trials were the best form of evidence in all cases and that there were no reasons why you should not always do them!

So – they exist and they are a bit bonkers! But what was even more striking to me about that particular meeting was how out of proportion people’s fear of RCTs seems to be compared to the very small number of people who think like my new randomista friend.

In fact, I am starting to get the impression that being anti-RCT is a bit of a badge of honour for those in the development field. A number of times recently, I have heard quite influential development wonks come out with statements about RCTs which would be comical if they weren’t so dangerous. To be clear, while RCTs are no silver bullet, they have generated knowledge which has saved millions of lives. So why on earth does the development sector hate them so much??

accreditationIt seems to me that it’s fueled by genuine fear that ignorant technocrats are going to shut down important development interventions simply because they do not produce rapid, measurable outcomes. This is a legitimate fear – but RCTs are not the culprit. RCTs are simply a tool not a political ideology.

This fear has generated a number of myths about RCTs which continue to circulate around in blogs and conference talks and don’t seem to die, no matter how many times they are shown to be false. Given their tenacity, I am fairly sure that any attempt I make to disprove them will make little difference – but it’s been a while since I have had a blog-based argument about experimental methodologies, so I think I will give it another go…

So, here are a few myths about RCTs…

MYTH 1: RCTs are useful for measuring things which are easy to count or measure objectively. This is why they are useful in medicine. But they are not useful for measuring ‘softer’ things like changes in attitudes/behaviour/perceptions, particularly when there might be a power imbalance between the investigator and the subject.

This is just not true. Many things which RCTs have been used to measure in medicine are perception-based. For example, there is not an objective biochemical marker for how bad a headache is but RCTs can be used to check if the perception of the headache is improved more when you are given an actual medicine compared to when you are given a placebo. The fact that improvement in headaches is subjective – and responsive to the power dynamics at play between doctors and patients – is precisely why any response to a pill needs to be compared with the response to a placebo so that you can get a measure of what the ‘real’ effect is. Changes in perception are particularly affected by the placebo effect, the desire of participants to please the investigator and a host of other biases which is why RCTs are particularly useful in these cases.

MYTH 2: RCTs=quantitative All other research=qualitative.

This is a common misunderstanding – in fact there is a great deal of quantitative research which is not experimental. RCTs are the most well-known variety of experimental approaches – all this means is that they set up a randomly assigned control group and compare the response in the group which get their actual treatment to the response in the control group. You can also get quasi-experimental approaches – this simply means is that there is a control group but it is not randomly assigned. Any other designs are called observational. These do include qualitative approaches but they also include quantitative research – for example econometric analysis.

MYTH 3: People who support evidence-informed policy believe that RCTs are the gold standard in research.

NOT TRUE (well ok, it may be true for my new randomista friend but it is certainly not a common belief in the circles I move in!). It is true that if you want to find out IF an intervention works, the most rigorous way to find out is to use an experimental approach. This does not mean that you always need to use an experimental approach – sometimes it would be absurd (hat tip to @knezovjb for that link) since there is no other plausable reason for the outcome and sometimes it is not practical. Observational research approaches are equally important but they are used to find the answers to other questions. For example: How does a particular intervention work or indeed why does it not? What is the distribution of a certain condition in a population? What is the economic and political situation of a given environment? etc etc. Observational approaches, such as before/after comparisons, are not the best way to check if something works simply because humans are so susceptible to bias – you may well find lots of people report that the intervention has benefitted them when there is actually no real affect of the intervention beyond placebo effects/desire to please the investigators.

MYTH 4: Development donors only want to fund RCTs.

I often hear people back up their belief in myth 3 by saying that it is clear that donors mainly believe in RCTs based on the fact that they invest so much money in them. This is just not true! I work at DFID and can say with certainty that the majority of research it funds does NOT use experimental approaches. All the data on what is funded by DFID and many other donors is freely available so if you don’t believe me, look it up (at some point when I get the time, I would like to take on a summer student to do a project looking at the data…). Similarly if you look at the evidence which is used in DFID business cases (again all freely available online) the majority is NOT experimental evidence. It is true that there are some bodies which are set up to fund experimental approaches but just as the fact that the Wellcome Trust only funds medical research does not mean that it thinks that agricultural research is less important, the existence of funders of experimental approaches does not in itself mean that there is a grand conspiracy to not fund other research. A variation on this myth is when people have had funding requests for observational research turned down by a development funder with the feedback that the approach lacked rigour. This is sometimes interpreted as meaning that the donors only like experimental approach – but this is not true. We desperately need good observational research – but the key word is good. That means being explicit about your methodology, surfacing and discussing potential biases, exploring alternative potential theories of change, considering if what you are measuring really allows you to answer the question you set out etc etc. See here for some great work on improving the rigour of qualitative approaches to impact assessment.

MYTH 5: RCTs are invariably and uniquely unethical.

It has been suggested that RCTs are unethical since they require that the control group is not given the ‘treatment’ intervention (or at least not at the same time as the treatment group). I think this argument is fairly weak since, whenever an intervention is rolled out, there will be people who get it and those who don’t. It has also been argued that it is unethical to expect participants who are not receiving an intervention to give up their time to contribute to someone’s research when they are not getting any direct benefit in return. I do think this is a valid point that needs to be explored – but this problem is in no way unique to RCTs. In fact, most observational research methods rely on people contributing ‘data’ without getting any direct benefit. Any project that is gathering information from vulnerable projects needs to consider these issues carefully and build an appropriate public engagement strategy.

.

So, I do think it is really important to have discussions on the value of different types of evidence in different contexts and in fact I am pretty much in agreement with a lot of the underlying concerns that the anti-RCT lobby have: I do get worried that a push to demonstrate results can lead donors to focus more on ‘technological fixes’ to problems instead of doing the ‘softer’ research to understand contexts and explore the reasons why the many existing ‘fixes’ have not achieved the impacts we might have hoped for. But I get frustrated that the debate on this subject tends to become overly polarised and is often based more on rhetoric than facts. I strongly agree with this blog which suggests that we should try to understand each other a bit better and have a more constructive discussion on this topic.

Advertisements

21 thoughts on “Chapter 2…in which kirstyevidence meets a randomista!

  1. Great blog and much needed. Working on private sector in development and specifically supporting ‘inclusive business’ that combines commercial return with impacts at the base of the pyramid, I am constantly being asked if we can do RCTs to prove impact. I sympathise with what I take to be the driver behind this: there is more hype than evidence about the impact of inclusive business. But here are my two problems with it.

    (1) there are also several solutions to that problem, and I sometimes think the RCT ‘solution’ is a bit too knee jerk. Who is plucking the lower-hanging fruit to assess impact? There is a growing amount of data about inclusive businesses but even normal research to amass, check, aggregate, interrogate and disaggregate data is lacking. So is ex-post tracking after a 3 year intervention. As an example, in the Business Innovation Facility we are doing masses of M&E before the end of this year for the end of the programme, but even so will not be doing justice to the data in a way that a serious research project could do. There are confidentiality issues around business data, but if we look only across the businesses that DFID supports through various challenge funds, where I assume confidentiality boundaries can be widened, there is a lot of data and many businesses that could be tracked longitudinally.

    (2) if we did try RCTs, what would be the control group? If the inclusive business is selling a water filter or lpg stove that has been custom designed to fit the habits and pocket of a low-income family, and is the clear market leader, what do we compare it to?

    One option is to compare them to businesses that sell $200 water filters and stoves and are not ‘inclusive’? That would be a simple comparison: one reaches hundreds of thousands of low-income households and one does not. A more meaningful comparison is between beneficiaries: those without access to the new stove or filter, and those without. Useful, but that is really just doing the outcome level assessment of the technology: how much cleaner is the kitchen air or water drunk than the alternative, and how much faster is the cooking or fetching? There are many ways to do this, including fieldwork with users and non-users plus reliance on those who have much better data (the International Clean Cookstove Alliance for example). Businesses can’t afford all the investment but donors could do a decent job in a few month’s work. A true RCT over many years would prove assumptions about lower disease incidence, which would be great, but before running at that pace there is masses that could be done to come up with decently informed output:outcome estimates.

    A third option for comparison would be the business approach to providing clean water or cleaner kitchens and a subsidy-based government or NGO programme. That would be really interesting. I don’t see it as an RCT. In the Impact Investment world, a few of the Funds that are furthest ahead in this compare their investment with the best comparable charitable option. Fantastic. They explore, even if crudely, the cost of generating the outputs via impact investment or via other funding routes. I’m impressed that the investor world has done this and reckon it’s got to be our next step on the donor side of inclusive business.

    So while I certainly see the benefits of RCTs, these are reasons why I tend to sigh when the next evaluation person asks if we can do RCTs to verify impacts of inclusive business.

  2. Why does the development sector hate randomistas?!

    Well, “hate” is perhaps a big word, but there is a clear dislike. Perhaps there is this dislike because of bigger donors having such an ‘elevated’ view of the value-added of RCT’s. I have to agree that RCT’s are not the only form of research being supported. But when looking at the amount of money going into RCT’s opposed to the support given to other types of M&E evidence-seeking… Perhaps donors support a multitude of evidence-seeking research, but RCT’s often do consume quite the amount of money…

    I’m managing a Community of Practice on the monitoring and evaluation of climate change interventions and the ‘calls for proposals’ regarding evidence-seeking research with a focus on RCT’s / impact evaluations always have a high value compared to other forms of M&E.
    And you are right; despite the amount of money going into RCT’s the actual results are not used all that much in business cases.

    Would be interesting to see what the financial value is of RCT’s supported, vis-`a-vis other approaches. Looking forward to the results of that summer student. 😉

    Another reason might be that RCT’s at times just don’t manage to capture the complexity of a situation; high levels of uncertainty, a wide array of stakeholders, multiple sectors involved, a vast amount of other externalities, etc. I do in that sense appreciate a gradual move away from constantly talking about attribution and more and more a discussion developing on contribution…

    Best,

    Dennis Bours
    Team leader SEA Change CoP
    Dbours@pactworld.org

  3. This is really useful and well-argued. We all agree that aid can be more effective and that well-formed questions and well-executed, applied research can offer many relevant clues about this. We all want to see deeper thinking behind the doing.

    Where I differ often differ from randomistas is on some fundamental beliefs about what prevents this and what ails the aid industry overall. Is it a lack of information about “what works”? Or is it a lack of respect for citizen-led initiatives? Or a lack of understanding about complex power dynamics that impede authentic relationships among development partners? And if it’s the latter two, my question is: Are RCTs just a band-aid on a deeper issue?

    Whether RCTs gain are considered just the latest fad in aid, or whether they become a part of accepted practice, I do think that it’s important to have dialogues like the one started here. Here was my “advice for donors” on RCTs – I’d be interested to hear your take: http://www.how-matters.org/2011/05/25/rcts-how-matters-advice-for-donors/

    As a commenter on Owen Barder’s blog once shared, “Great tools, we economists undoubtedly do have. In studying development issues, they are often used unhelpfully due to hubris and a shocking level of comfort with ignorance about the phenomenon being studied.” At the end of the day, for those of us involved in organizational learning, perhaps we should consider RCTs as one of many in the toolbox – the match of the right tool to the right job is key.

  4. Well argued and demystifying myths about RCTs

  5. Great blog! I think a better response to myth 1, though, is to clarify that measurement and attribution are two different things (although attribution is necessary to measure net impact). How one chooses to measure soft concepts is independent of whether one uses experimental techniques to attribute changes in those concepts to a particular policy or program. You can have a perfect RCT from an estimation standpoint with lousy indicators for measuring the concepts. Similarly, you can have excellent measurement in an evaluation that is unable to attribute changes.

    I do think it is a valid critique that many RCTs pay little attention to measurement and focus only on estimation, and as a result, end up with weak if not ridiculous variables attempting to measure soft concepts quantitatively.

  6. Pingback: Evaluation: using the right tool for the job | kirstyevidence

  7. Pingback: Trust the machine? A keen eye is what you need | The Cochrane Schizophrenia Group

  8. Pingback: Interesting links: July 2013 | 50shadesofevidence

  9. Pingback: An idiot’s guide to research methods | kirstyevidence

  10. Hi Kirsty, I would say 1 and 2 are well made points, 4 and 5 are straw men (no-one ever says those things, in my experience). The one where I think I disagree is myth 3 – I have been told by people at GSDRC, funded by DFID to assess the quality of research and evidence, that DFID’s guidance to them says that the ONLY source of evidence that can be considered gold standard is RCTs. Did they get that wrong (or did I mishear them)?

    • Thanks for this response Duncan ..and apologies if I drifted into strawman territory on points 4 and 5 😉
      Your comment regarding point 3 is worrying. Holding RCTs up as the only source of evidence that can be considered gold standard is definitely not DFID policy. The DFID ‘How-to’ Note on assessing evidence explicitly states that it:
      “avoids constructing a hierarchy of research designs and methods (though some disciplines do consider designs and methods hierarchically). It recognises that different designs are more or less appropriate to different contexts, and different research questions. Counterfactuals are likely to be important for establishing the presence and strength of a causal relationship, but explanation for the nature of, and mechanisms behind causal relationships is often best achieved by observational designs using qualitative methods.”
      Having said that, just because it is DFID policy doesn’t mean that every person who works in DFID ‘got the memo’ – so perhaps the message given to GSDRC was given by someone who did not know the official position on this? If you are able to give me any further details (by email if you want to keep it confidential) I could try to find out a bit more.

  11. Hi Kirsty,
    I’ll approach you somewhere between RCTs are the holy grail and Beelzebub in a white lab jacket, and agree that you do occasionally see a rather annoying and complete rejection of RCTs that I personally suspect is actually often mathematical illiteracy internalised in the form of lies, damned lies, and statistics/equations/etc.

    However, let me explain why I and many others aren’t enthusiastic about RCTs – I believe you’re missing it. Perhaps it’s because we disagree about the nature of most development interventions. I believe it’s mostly about changing institutions. Now, if institutions were merely observable rules, RCTs would be terrific. RCTs would similarly be alright even if we consider institutions as icebergs where some parts are not observable – you just need to accept the black-box approach.

    However, more and more research is uncovering the complexity of institutions. How institutions are continuously pieced together strategically and unconsciously by people and groups situated in different localities and circumstances. Drawing on Frances Cleaver and Avner Greif, I define an institution as: a complex and dynamical emergent property of socially positioned actors organising of rules, beliefs, norms, and organizations that together generate a regularity of (social) behaviour.

    If this is true, what we need to identify is the mechanism more than the inputs by which institutions change. RCTs are not terrific for this – and that’s why I’m not enthusiastic about them.

    • Thanks Søren for this comment.

      I completely agree with you that changing institutions is at the root of development. And I completely agree with your definition of institions. And I completely agree that RCTs are not the best way to understand the mechanisms by which institutions change. In fact, I am not very sure what in my blog makes you think that I disagree with you or that I am ‘missing’ this?

      Perhaps the only point of disagreement I have is that the knowledge that RCTs are not the best tools for understanding the mechanisms of institutional change does not make me un-enthusiastic about them just as the knowledge that non-experimental qualitative approaches are not the best tools for revealing causal pathways does not make me un-enthusiastic about them.

      Both are important and useful tools that can be used to help us understand the world better.

      • Hi Kirsty, thanks.
        I apologise if I might have over-interpreted your piece but it appeared to me that you’re portraying people sceptical of RCTs as superstitious curmudgeons. I offered another possibility. Let’s be realistic here, the different methodologies are not all equal. I didn’t hear anyone speculating in a Nobel prize to some founder of the qualitative interview.
        Best

        • Nope, still not sure which bit of my blog would lead you to conclude that I am “portraying people sceptical of RCTs as superstitious curmudgeons”. From my perspective, I have outlined some things which I have heard many people say about RCTs and explained why they are not true. Is it possible that you are falling into the trap of option 2 from this blog – i.e. arguing with an imaginary person who has a position much more extreme than I have?

  12. An excellent paper reviewing randomised controlled trials of parachutes, which identifies none have been done and proposing a randomised cross-over trial…. Give a copy of this to a Randomista and ask if they wish to participate in the trial!

    From:
    http://www.bmj.com/content/327/7429/1459

    Abstract

    Objectives To determine whether parachutes are effective in preventing major trauma related to gravitational challenge.

    Design Systematic review of randomised controlled trials.

    Data sources: Medline, Web of Science, Embase, and the Cochrane Library databases; appropriate internet sites and citation lists.

    Study selection: Studies showing the effects of using a parachute during free fall.

    Main outcome measure Death or major trauma, defined as an injury severity score > 15.

    Results We were unable to identify any randomised controlled trials of parachute intervention.

    Conclusions As with many interventions intended to prevent ill health, the effectiveness of parachutes has not been subjected to rigorous evaluation by using randomised controlled trials. Advocates of evidence based medicine have criticised the adoption of interventions evaluated by using only observational data. We think that everyone might benefit if the most radical protagonists of evidence based medicine organised and participated in a double blind, randomised, placebo controlled, crossover trial of the parachute.

  13. Pingback: Four Critiques Of RCTs That Aren’t Really About RCTs | Development Intern

  14. Pingback: Experimental methodologies… and baby pandas | kirstyevidence

Leave a Reply (go on, you know you want to...)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s