ML-based geographic sampling frames miss transitory populations in fragile regions
Andrea C. Caflisch, Daniel Masterson, Stephen D. O'Connell, Ettan Patel, and Julia Smith-Omomo
Survey research on conflict and displacement depends on the reliability of sampling frames. We show that common approaches to developing these frames may underrepresent populations central to this research: displaced people, returnees, and civilians exposed to violence. We develop a hybrid approach to sampling frame generation that combines machine-learning (ML)-generated building footprints with satellite imagery. We test the approach in displacement-affected Iraqi communities. Our approach achieved 87% residential accuracy overall, and our data reveal non-random omissions in the common ML-only approach. We find systematic coverage differences across frame-generation methods. Manual methods capture more rural internally displaced persons and urban returnees, due to informal shelters and wartime reconstruction. Our hybrid ML-and-satellite sampling can mitigate coverage error and improve inference about conflict and displacement.
Keywords: machine learning; building footprints; sampling frames; survey methodology; displacement; internally displaced persons; post-conflict; satellite imagery; Iraq; humanitarian assistance.