Three Ways to Spot Bad Data

suspicious graph

Warning Sign #1: When government officials use the data to set targets like an increase in vaccination or a decrease in cancer numbers, they always use percentages, not absolute numbers. That’s a sign that people know the numbers are wrong and don’t want to rely on them. (Of course, sometimes it just means that the percentage is the right way to look at it. Increasing the number of people in the district with access to clean water by 20% conveys more information than saying you want to increase it by 330,000 people. You need to use your judgment. (as always)

Warning Sign #2: The disaggregation doesn’t make sense. This is a judgment call again – sometimes the data are weird because there is something weird going on (Such as India’s missing girls. We only wish that data was fake.) For example, pregnancy is a major risk factor for anemia. If your rates of anemia in pregnant women are lower than the rates in the general population, something is wonky.

Warning Sign #3: The math doesn’t work. If you know a few true numbers, you can use them to ground-truth the rest of your data. For example, if you know the perinatal mortality rate for the smallest babies, then you can use it to determine whether the reported infant mortality rate makes sense. (This slide deck has the detailed instructions, starting from slide #20.)