ML-based geographic sampling frames miss transitory populations in fragile regions

Andrea C. Caflisch, Daniel Masterson, Stephen D. O'Connell, Ettan Patel, and Julia Smith-Omomo

Survey research on conflict and displacement depends on the reliability of sampling frames. We show that common approaches to developing these frames may underrepresent populations central to this research: displaced people, returnees, and civilians exposed to violence. We develop a hybrid approach to sampling frame generation that combines machine-learning (ML)-generated building footprints with satellite imagery. We test the approach in displacement-affected Iraqi communities. Our approach achieved 87% residential accuracy overall, and our data reveal non-random omissions in the common ML-only approach. We find systematic coverage differences across frame-generation methods. Manual methods capture more rural internally displaced persons and urban returnees, due to informal shelters and wartime reconstruction. Our hybrid ML-and-satellite sampling can mitigate coverage error and improve inference about conflict and displacement.

Keywords: machine learning; building footprints; sampling frames; survey methodology; displacement; internally displaced persons; post-conflict; satellite imagery; Iraq; humanitarian assistance.

Posted on:: May 1, 2026

Length:: 1 minute read, 141 words

Categories:: Ongoing project

Tags:: machine learning sampling frames survey methodology displacement Iraq

See Also:: Economic Recovery and Social Cohesion: A Field Experiment with Capital Grants in Post-Conflict Iraq; Social and Distributional Effects of Capital Grants for Small and Medium Enterprises on Employers and Employees: Evidence from Post-War Iraq