Leaders of health and education systems often rely on datasets that are inaccurate or incomplete. Advanced analytics can help understand and improve data reliability.

Health facilities in one country we work in are under pressure to report good results. Each month they report the number of children vaccinated, family planning consultations undertaken, antenatal checks delivered, and so on. There are few checks on the data they submit. Under pressure to produce good results, sevens become tens, thirteens become fifteens, and so on. This creates interesting patterns in the data, which allow us to quickly assess the reliability of any dataset. 

For instance, nobody ever fakes a thirteen.

When data is altered, people have a preference for multiples of five and even numbers. 

They have an aversion to prime numbers. Looking at the incidence of prime numbers, multiples of five and even numbers, can help to quickly understand where and how a data set has been corrupted. It can also allow us to see where data is being corrupted and measure the impact of interventions to improve data quality.

For instance, we looked at one dataset showing the number of antenatal checks each clinic had performed in the previous month. If the data were accurate, you would expect roughly similar numbers of clinics reporting 29, 30 and 31 consultations. In the reported data, twice as many clinics reported having undertaken 30 checks as 29 or 31. Similar spikes occurred for 15, 20, 40 and 50. This data was clearly corrupted.

newsletter_falseData-fig1.svg

Fig.1: Number of health facilities reporting each number of antenatal checks

Another indicator, the number of deliveries conducted by each clinic, was subject to more rigorous checks. As a result, the distribution was much smoother. There were small peaks at 15 and 20 suggesting that the numbers were not perfect, but overall, the level of false reporting was low. 

newsletter_falseData-fig2.svg

Fig.2: Number of health facilities reporting each number of deliveries

Our teams use hundreds of different techniques to assess and improve data systems. Used correctly, advanced analytics can help give health and education system leaders the information they need to drive reforms.

AUTHORS

Fenton Whelan