Genuine Fakes : The Prevalence and Implications of Data Fabrication in a Large South African Survey

How prevalent is data fabrication in household surveys? Would such fabrication substantially affect the validity of empirical analyses? We document how we identified such fabrication in South Africa's longitudinal National Income Dynamics Study, which affected about 7% of the sample. The fabric...

Full description

Bibliographic Details
Main Authors: Finn, Arden, Ranchhod, Vimal
Format: Journal Article
Published: Published by Oxford University Press on behalf of the World Bank 2018
Subjects:
Online Access:http://hdl.handle.net/10986/30131
Description
Summary:How prevalent is data fabrication in household surveys? Would such fabrication substantially affect the validity of empirical analyses? We document how we identified such fabrication in South Africa's longitudinal National Income Dynamics Study, which affected about 7% of the sample. The fabrication was detected while fieldwork was still on-going, and the relevant interviews were reconducted. We thus have an observed counterfactual that allows us to measure how problematic such fabrication would have been, had it remained undetected. We compare estimates from the dataset that includes the fabricated interviews with corresponding estimates that includes the corrected data instead. We find that the fabrication would not have affected our univariate and cross-sectional estimates meaningfully, but would have led us to reach substantially different conclusions when implementing panel estimators. We estimate that the data quality investigation in this survey had a benefit-cost ratio of at least 24, and was thus easily justifiable.