The science of detecting false data
By Fenton Whelan
Download (.pdf)
Leaders of health and education systems often rely on datasets that are inaccurate or incomplete.

Advanced analytics can help understand and improve data reliability.

Health facilities in one country we work in are under pressure to report good results. Each month they report the numbers of children vaccinated, family planning consultations undertaken, antenatal checks delivered, and so on.

There are few checks on the data they submit. Under pressure to produce good results, sevens become tens, thirteens become fifteens, and so on.

This creates interesting patterns in the data, which allow us to quickly assess the reliability of any dataset.

For instance, nobody ever fakes a thirteen. When data is altered, people have a preference for multiples of five and even numbers. They have an aversion to prime numbers.

Looking at the incidence of prime numbers, multiples of five, and even numbers, can help to quickly understand where and how a data set has been corrupted. It can also allow us to see where data is being corrupted and measure the impact of interventions to improve data quality.

For instance, we looked at one dataset showing the number of antenatal checks each clinic had performed in the previous month. If the data were accurate, you would expect roughly similar numbers of clinics reporting 29, 30 and 31 consultations.

In the reported data, twice as many clinics reported having undertaken 30 checks as 29 or 31. Similar spikes occurred for 15, 20, 40 and 50. This data was clearly corrupted.




fig.1: Number of health facilities reporting each number of antenatal checks


Another indicator, the number of deliveries conducted by each clinic, was subject to more rigorous checks. As a result, the distribution was much smoother. There were small peaks at 15 and 20 suggesting that the numbers were not perfect, but overall, the level of false reporting was low.




fig.2: Number of health facilities reporting each number of deliveries


Our teams use hundreds of different techniques to assess and improve data systems. Used correctly, advanced analytics can help give health and education system leaders the information they need to drive reforms.

Why smaller classes sizes don’t improve learning
By Fenton Whelan and Richelle George
10 December 2018
Smaller class sizes mean more teachers, which almost always means less learning.
Learn more
Case for intervention in private schools
By Fenton Whelan, Jonny Barty & Max Balestra
05 August 2019
Large numbers of children in developing countries attend private rather than government schools.
Learn more
Many reform efforts continue the status quo or apply radical solutions to common problems. In our experience the best approach is far..
Learn more