Warning Sign #1: When government officials use the data to set targets like an increase in vaccination or a decrease in cancer numbers, they always use percentages, not absolute numbers. That’s a sign that people know the numbers are wrong and don’t want to rely on them. (Of course, sometimes it just means that the percentage is the right way to look at it. Increasing the number of people in the district with access to clean water by 20% conveys more information than saying you want to increase it by 330,000 people. You need to use your judgment. (as always)
Warning Sign #2: The disaggregation doesn’t make sense. This is a judgment call again – sometimes the data are weird because there is something weird going on (Such as India’s missing girls. We only wish that data was fake.) For example, pregnancy is a major risk factor for anemia. If your rates of anemia in pregnant women are lower than the rates in the general population, something is wonky.
Warning Sign #3: The math doesn’t work. If you know a few true numbers, you can use them to ground-truth the rest of your data. For example, if you know the perinatal mortality rate for the smallest babies, then you can use it to determine whether the reported infant mortality rate makes sense. (This slide deck has the detailed instructions, starting from slide #20.)
Three Ways to Spot Bad Data – @alanna_shaikh http://t.co/fkoOlEw6um
Three Ways to Spot Bad #Data http://t.co/nJdIP0wfCy #baddata #opendata
Three Ways to Spot Bad Data http://t.co/uCw8l9CBZZ
Three Ways to Spot Bad Data http://t.co/avsWxKtLbz via @alanna_shaikh
Three Ways to Spot Bad Data #development http://t.co/4E7rdgcTn8
Reporting of suspect data: a short guide by @alanna_shaikh on how to spot it http://t.co/bfgRXNcwCl
One thing I’d add is to look for outliers. This is similar to point three above, but in a slightly different form. If you’re not intimately familiar with the data and where a project stands in relation to others, putting them into a spreadsheet and running analysis (ideally with something like R/SPSS/Stata), or simply eyeballing the data can be useful.
Because if a program stands out as excellent on a key indicator, it’s worth asking “what’s being done that we can learn from”? Real results that show dramatic improvements over past performance or benchmarks elsewhere deserve to have lessons drawn from. On the other hand, if the results are much better than expected, there may be some issues with the data. Where those issues are small, it can be hard to pull them apart. I know of a program that is seeing incredible successes (95% reduction in a key indicator) and almost all of the success is real. Nevertheless there are some reporting issues, with clinics not reporting every case – there’s no financial incentive to do so, but they get the satisfaction of congratulations when they report zero cases. As the program progresses, these will either stop, or will become more evident.
Three ways to spot bad data by @alanna_shaikh http://t.co/PTxtzgtBZ3 #evaluation
Smart words from @alanna_shaikh: ‘Three ways to spot bad evidence’ http://t.co/1TMQbpECgd