ML-based geographic sampling frames miss transitory populations in fragile regions

Andrea C. Caflisch, Daniel Masterson, Stephen D. O'Connell, Ettan Patel, and Julia Smith-Omomo

Survey research on conflict and displacement depends on the reliability of sampling frames. We show that common approaches to developing these frames may underrepresent populations central to this research: displaced people, returnees, and civilians exposed to violence. We develop a hybrid approach to sampling frame generation that combines machine-learning (ML)-generated building footprints with satellite imagery. We test the approach in displacement-affected Iraqi communities. Our approach achieved 87% residential accuracy overall, and our data reveal non-random omissions in the common ML-only approach. We find systematic coverage differences across frame-generation methods. Manual methods capture more rural internally displaced persons and urban returnees, due to informal shelters and wartime reconstruction. Our hybrid ML-and-satellite sampling can mitigate coverage error and improve inference about conflict and displacement.

Keywords: machine learning; building footprints; sampling frames; survey methodology; displacement; internally displaced persons; post-conflict; satellite imagery; Iraq; humanitarian assistance.

Posted on:
May 1, 2026
Length:
1 minute read, 141 words
Categories:
Ongoing project
Tags:
machine learning sampling frames survey methodology displacement Iraq
See Also:
Economic Recovery and Social Cohesion: A Field Experiment with Capital Grants in Post-Conflict Iraq
Social and Distributional Effects of Capital Grants for Small and Medium Enterprises on Employers and Employees: Evidence from Post-War Iraq