Measuring capacity… to play tiddly winks


In a previous post I talked about the issue of many capacity building initiatives using self-reported ability as a measure of impact. To further illustrate this point, I decided to carry out a small scientific experiment. I gathered a randomly selected group of study participants and gave them the following instructions:

1. Rate their tiddly wink playing ability on a scale of 1 (dreadful) to 5 (outstanding)
2. Tiddle their wink (no winking of tiddles was allowed)
3. Measure how far they had tiddled their wink (in cms)

(I suspect that the above may be the best sentence I will EVER blog)

The results are shown in the graph below. Now as any scientist will tell you, the R^2 value written on this graph indicates that this (poorly designed, probably manipulated ;-)) experiment definitively proves that my pre-existing prejudices were correct i.e. self-reported ability to play tiddly winks is not correlated with actual ability.

So what is the conclusion of this? Well, for one thing, if you ever see a log frame for a project to build tiddly winking capacity with ‘increase in self-reported tiddly wink ability’ as the verifiable indicator, I trust you will refer the project team to this definitive rebuttal.

But in addition, I think we should be a lot more sceptical of any project which uses perceived increase in ability as the measure of success. Of course, there may be some abilities which ARE correlated with self-reported ability. But I suggest that this correlation needs to be demonstrated before self-reporting can be used as a reliable proxy indicator.


8 thoughts on “Measuring capacity… to play tiddly winks

  1. What happens to the coefficient of determination if we don’t consider the arrogant (self-rating 5) or the the modest (self-rating 1)?

  2. Yes, what Michael said…I thought it was an experiment into personality type and self belief/arrogance (Who blows their trumpet the best, who needs to have more self belief). Can we collect more data along with demographics so that you can disaggregate by age/gender/profession/ level of seniority in said profession/attractiveness….. Actually attractiveness may be hard to get self reported data on so I’m happy to help you judge

  3. Wow, its almost as if you guys are not taking my highly scientific approach seriously…

  4. were there any tiddle controls?

  5. And of course, our old friends Dunning and Krueger play a role here. We really are very bad at assessing our own abilities.

  6. You could definitely remove the outliers to reveal a stronger “correlation”. I love stats; it’s like lying with a clean conscience.

  7. Splendid commentary on the tiddling of winks. This reminds me of a comment I made in a meeting a few weeks ago on the nature of expertise; experts are better at self-monitoring and self-regulation of performance (than non-experts) but at the same time are likely to overestimate their ability. There’s been studies showing this quite well with surgeons…

  8. Dear Kirsty,
    funny experiment which I came across just now. But I wonder why you only conclude that we should be sceptical about self-perceived ability. I come to the contrary conclusion. If I took your experiment seriously, I would ask:
    a) How often did the candidates try the tiddly winks?
    b) Did you form an average or did you take the best try?
    c) Did you allow them to try at different days to avoid that a poor day influences the data?
    d) Did everyone have the same conditions or could people choose the setting?

    If only one of these questions is answered unsatisfactoriy, your experiment would not test ability but only a momentary show of ability. And then I would rather go for self-reported ability because it is likely to be as reliable but less laborious, so more value for money. I am not alone on this. There are loads of books and articles on the limited value of tests.

    Now serious again: I agree that self-reported ability is questionable. But your question was “increase in ability”, and a before-and-after self-assessment is more reliable, except that a successful programme can also change standards against which people assess themselves and others. (I have had cases where people assessed themselves more poorly after a programme than before because they had realised what abilities they still lacked.) Assessment by others can help and be cost-effective. Observation of performance can be better, but it needs to be feasible to do. For feasibility, programs often need to be re-designed so that they do follow-up and accompaniment, and on the side can gather this data.

    Best Bernward

