Evaluation | kirstyevidence

January 24, 2016
by kirstyevidence 2 Comments

Holding decision makers to account for evidence use

Evidence-informed policy – it’s a wonderful thing. But just how widespread is it? The ‘Show your workings’ report from the Institute of Government (and collaborators Sense About Science and the Alliance for Useful Evidence) has asked this question and concluded… not very. It states “there [are] few obvious political penalties for failing to base decision on the best available evidence”. I have to say that as a civil servant this rings true. It’s not that people don’t use evidence – actually most civil servants, at least where I work, do. But there are not good systems in place to distinguish between people who have systematically looked at the full body of evidence and appraised its strengths and weaknesses – and those who have referenced a few cherry-picked studies to back up their argument.

Rosie is my actual cat’s name. And she does indeed make many poor life decisions. Incidentally, I named my other cat ‘Mouse’ and now that I am trying to teach my child to identify animals I am wondering just how wise a life decision that was…

The problem for those scrutinising decision making – parliament, audit bodies and, in the case of development, the Independent Commission for Aid Impact – is that if you are not a topic expert it can be quite hard to judge whether the picture of evidence presented in a policy document does represent an impartial assessment of the state of knowledge. The IoG authors realised this was a problem quite early in their quest – and came up with a rather nifty solution. Instead of trying to decide if decisions are based on an unbiased assessment of evidence, they simply looked at how transparent decision makers had been about how they had appraised evidence.

Now, on the evidence supply side there has been some great work to drive up transparency. In the medical field, Ben Goldacre is going all guns blazing after pharmaceutical companies to get them to clean up their act. In international development, registers of evaluations are appearing and healthy debates are emerging on the nature of pre-analysis plans. This is vitally important – if evaluators don’t declare what they are investigating and how, it is far too easy for them to not bother publishing findings which are inconvenient – or to try multiple types of analysis until, by chance, one gives them a more agreeable answer.

But as the report shows, and as others have argued elsewhere, there has been relatively little focus on transparency on the ‘demand’ side. And by overlooking this, I think that we might have been missing a trick. You see, it turns out that the extent to which a policy document explicitly sets out how evidence has been gathered and appraised is a rather good proxy for systematic evidence appraisal. And the IoG’s hypothesis is that if you could hold decision makers to account for their evidence transparency, you could go some way towards improving the systematic use of evidence to inform decision makers.

The report sets out a framework which can be used to assess evidence transparency. As usual, I have a couple of tweaks I would love to see. I think it would be great if the framework included more explicitly an assessment of the search strategy used to gather the initial body of evidence – and perhaps rewarded people for making use of existing rigorous synthesis products such as systematic reviews. But in general, I think it is a great tool and I really hope the IoG et al. are successful in persuading government departments – and crucially those who scrutinise them – to make use of it.

July 22, 2015
by kirstyevidence 5 Comments

Race for impact in the age of austerity

I have recently been pondering what the age of austerity means for the development community. One consequence which seems inevitable is increasing scrutiny of how development funds are spent. The principle behind this is hard to argue with; money is limited and it seems both sensible and ethical to make sure that we do as much good as possible with what we have. However, the way in which costs and benefits are assessed could have a big impact on the future development landscape. Already, some organisations are taking the value for money principle to its logical conclusion and trying to assess and rank causes in terms of their ‘bang for your buck’. The Open Philanthropy project has been comparing interventions as diverse as cash transfers, lobbying for criminal justice reform and pandemic prevention, and trying to assess which offers the best investment for philanthropists (fascinating article on this here).

The Copenhagen Consensus project* is trying to do a similar thing for the sustainable development goals; using a mixture of cost-benefit analysis and expert opinion, they are attempting to quantify how much social, economic and environmental return development agencies can get by focussing on different goals. For example, they find that investing a dollar in universal access to contraception will result in an average of $120 of benefit. By contrast, they estimate that investing a dollar in vaccinating against cervical cancer will produce only $3 average return. Looking over the list of interventions and the corresponding estimated returns on investment is fascinating and slightly shocking. A number of high profile development priorities appear to give very low returns while some of the biggest returns correspond to interventions such as trade liberalisation and increased migration which are typically seen as outside the remit of development agencies (good discussion on ‘beyond-aid agenda’ to be found from Owen Barder et al. at CDG e.g. here).

In general, I find the approach of these organisations both brave and important. Of course there needs to be a lot of discussion and scrutiny of the methods before these figures are used to inform policy – for example, I had a brief look at the CC analysis of higher education and found a number of things to quibble with, and I am sure that others would find the same if they examined the analysis of their area of expertise. But the fact that the analysis is difficult does not mean one should not attempt it. I don’t think it is good enough that we continue to invest in interventions just because they are the pet causes of development workers. We owe it both to the tax payers who fund development work and to those living in poverty to do our best to ensure funds are used wisely.

Achieving measurable impacts without doing anything to address root causes

Having said all that, my one note of caution is that there is a danger that these utilitarian approaches inadvertently skew priorities towards what is measurable at the expense of what is most important. Impacts which are most easily measured are often those achieved by solving immediate problems (excellent and nuanced discussion of this from Chris Blattman here). To subvert a well-known saying, it is relatively easy to measure the impact of giving a man a fish, more difficult to measure the impact of teaching a man to fish** and almost impossible to measure, let alone predict in advance, the impact of supporting the local ministry of agriculture to develop its internal capacity to devise and implement policies to support long-term sustainable fishing practices. Analysts in both the Copenhagen Consensus and the Open Philanthropy projects have clearly thought long and hard about this tension and seem to be making good strides towards grappling with it. However, I do worry that the trend within understaffed and highly scrutinised development agencies may be less nuanced.

So what is the solution? Well, firstly development agencies need to balance easy to measure but low impact interventions with tricky to measure but potentially high impact ones. BUT this does not mean that we should give carte blanche to those working on tricky systemic problems to use whatever shoddy approaches they fancy; too many poor development programmes have hidden behind the excuse that it is too complicated to assess them. Just because measuring and attributing impact is difficult does not mean that we can’t do anything to systemstically assess intermediate outcomes and use these to tailor interventions.

To take the example of organisational capacity building – which surely makes up a large chunk of these ‘tricky’ to measure programmes – we need to get serious about understanding what aspects of design and implementation lead to success. We need to investigate the effects of different incentives used in such projects including the thorny issue of per diems/salary supplements (seriously, why is nobody doing good research on this issue??). We need to find out what types of pedagogical approach actually work when it comes to supporting learning and then get rid of all the rubbish training that blights the sector. And we need to think seriously about the extent of local institutional buy-in required for programmes to have a chance of success – and stop naively diving into projects in the hope that the local support will come along later.

In summary, ever-increasing scrutiny of how development funds are spent is probably inevitable. However, if, rather than fearing it, we engage constructively with the discussions, we can ensure that important but tricky objectives continue to be pursued – but also that our approach to achieving them gets better.

* Edit: thanks to tribalstrategies for pointing out that Bjorn Lomborg who runs the Copenhagen Consensus has some controversial views on climate science. This underscores the need for findings from such organisations to be independently and rigorously peer reviewed.

**High five to anyone who now has an Arrested Development song on loop in their head.

August 4, 2013
by kirstyevidence 6 Comments

Evaluation: using the right tool for the job

screwdriver

In response to my last post, I got a couple of comments (thanks @cashley122 and @intldogooder!) that were so good, I decided to devote a whole post to responding to them. Both commenters were pointing out the tendency of some evaluators to approach a problem with a specific tool – rather than first figuring out what the right question to ask is, and then designing a tool to fit. They were referring to people who want to evaluate every problem with an RCT – but it is just as much of a problem when evaluators approach every question with a specific qualitative approach – a phenomenon which is discussed in this recently published paper by my former colleagues Fran Deans and Alex Ademokun. The paper is an interesting read – it analyses the proposals of people who applied for grant money to evaluate evidence-informed policy. It reveals that many applicants suggested using either focus groups or key-informant interviews – not because these were considered to be the best way to find out how evidence-informed a policy making institution was – but simply because these were the ‘tools’ which the applicants knew about.

I have been reflecting on these issues and thinking about how we can improve the usefulness of evaluations. So, today’s top tips are about using the right tool for the job. I have listed three ideas below – but would be interested in other suggestions…

1. Figure out what question you want to answer

The point of doing research is, generally, to answer a question, and different types of question can be answered with different types of method. So the first thing you need to figure out is what question you want to ask. This sounds obvious but it’s remarkable how many people approach every evaluation with essentially the same method. There are countless stories of highly rigorous experimental evaluations which have revealed an accurate answer to completely the wrong question!

2. Think (really think) about the counterfactual

A crucial part of any evaluation is considering what would have happened if the intervention had not happened. Using an experimental approach is one way to achieve this – but it is often not possible. For example, if the target of your intervention is a national parliament, you are unlikely to be able to get a big enough sample size of parliaments to randomise them to treatment and control groups in order to compare what happens with or without the intervention. But this does not mean that you should ignore the counterfactual – it just means you might need to be more creative. One approach would be to compare the parliament before and after the intervention and combine this with some analysis of the context which will help you assess potential alternative explanations for change. A number of such ‘theory-based’ approaches are outlined in this paper on small n impact evaluations.

To strengthen your before/after analysis further, you could consider adding in one or more additional variables which you would not expect to change due to your intervention but which would change as a result of some other confounders. For example, if you were implementing an intervention to increase internet searching skills, you would not expect skills in formatting Word documents to increase. If both variables increased, it might be a clue that the change was due to a confounding factor (e.g. the parliament had employed a whole lot of new staff who were much more computer literate). This approach (which has the catchy title of ‘Nonequivalent Dependent Variables Design‘) can add an additional level of confidence to your results.

The point is not that these approaches will be perfect – it is not always easy to demonstrate the impact of a given intervention – but just because a ‘perfect’ design is not possible does not mean that it’s not worth trying to come up with a design that is as good as possible.

3. Think about the inputs as well as the outputs

Many evaluations set out to ask ‘Does this intervention work in this setting?’. Of course this is a really important question to ask – but development funders usually also want to know whether it works well enough to justify the amount of money it costs. I am well aware that nothing is more likely to trigger a groan amongst development types than the words ‘Value for Money’ – but the fact is that much development work is funded by my Nanna’s tax dollars* and so we have a duty to make sure we are using it wisely (believe me, you wouldn’t want to get on the wrong side of my Nanna).

So, how do you figure out if something is worth the money? Well, again, it is not an exact science, but it can be really useful to compare your intervention with alternative ways of spending the funds and what outcomes these might achieve. An example of this can be found in section 5.1 of this Annual Review of a DFID project which compares a couple of different ways of supporting operational research capacity in the south. A really important point (also made in this blog) is that you need to consider timescales in value for money assessments – some interventions may take a long time – but if they lead to important, sustained changes, they may offer better value for money than superficial quick wins.

*Just to be clear, it is not that my Nanna bankrolls all international development work in the world. That would be weird. But I just wanted to make the point that the money comes from tax payers. Also, she doesn’t pay her taxes in dollars but somehow tax pounds doesn’t sound right so I used my artistic license.