Mapping and Listing in Household Surveys with R

Filip Mitrovic
Analytics Vidhya
Published in
5 min readJan 16, 2021

--

Household surveys can be a tough undertaking, as many moving pieces have to be synced to collect reliable data. This requires large teams to move so many different tasks along. Also it will require sophisticated tools to prepare for the survey.

However, sometimes teams will be small, and lacking of important tools, with the same goal of collecting quality data. In such a situation, using provided or similar R script might be handy to deal with issues like listing and mapping, even without advanced GIS tools like ArchGIS or high definition pictures of selected sample areas.

In the A,B,Cs of household surveys (or surveys in general) it should be made easy for interviewers to identify those households in the field for a sample of selected households to be interviewed during the survey fieldwork.

CAPI data collection in the field, illustration

Listing of households selected for the survey is usually prepared and given to the enumerators tasked with of interviewing household members with prepared survey questionnaires. And also, to ease the identification of households, listings should include names of the household heads, physical addresses of the households selected (including zip codes, names of the municipalities, street addresses and numbers, etc), as well as maps of the areas where the households are located, and ideally short instructions on how to reach each of the physical dwellings where the households reside.

Listing information is usually prepared though a listing exercise, involving a number of teams of listers (enumerators) and mappers visiting all the households in the area selected for the survey

As mentioned, statistical offices usually do have very sophisticated and precise geo-data used in implementation of the surveys in the field. As well as tools to plot that data in a user friendly manner, making listing exercise easier for the teams on the ground.

Geo-tagged data (information on coordinates) on households selected for the survey sample is a very useful information for enumerators who are conducting the survey.

However, if data cannot be plotted on high def. maps or with a licensed software, R and following script might help in providing needed maps of needed quality to move forward with the mapping and listing for the fieldwork data collection. Working with small stats office, we have encountered this challenge.

To prepare teams for fieldwork data collection, instead of sending them without additional maps, we have prepared maps in R.

First, this is the list of libraries that are used for the creation of the map files. Please note that some of them I have left in without critically reviewing the value added of each package. But all in all they will help create a better look for the proposed mapping exercise.

  library(ggplot2)
library(ggmap)
library(rgdal)
library(ggrepel)
library(cowplot)
library(rgdal)
library(ggspatial)
library(ggpubr)
library(ggsn)

In interest of privacy and ethical concerns, data used in this example has been anonymized (names of household heads have been replaced with generic names) while lon and lat values have been modified by a common value.

File for this exercise, using dummy values generated in line with described principles is available on Dropbox, and can be accessed using following function. Following line in the snippet names columns in a way that allows for use of common terms from a listing dataset.

tr <- read.csv("https://www.dropbox.com/s/i72ntsiba8h7jnr/hh_sample.csv", comment.char="#")colnames(tr)<-c("Team", "Day", "HH", "Cluster Number", "Household Number", "EA code", "hh_id", "hhsize", "island", "island code", "village", "village code", "EA code4", "occupancy", "dwelling type", "name", "family name", "domain", "domain5", "LAT", "LON", "HH_Head")

Before using google maps in ggmap package, make sure you have API key for your google services connected with your R Studio (on how to do that, please read here).

For this example only a few columns will actually be used — name of household head, latitude and longitude data. To format selected columns in a dataframe I used this part of the code

df <- NULL
df$HH<-tr$HH
df$HH_Head <- tr$HH_Head
df$lat<- tr$LAT
df$lon<- tr$LON
df<-as.data.frame(df)
df$HH_Head1<-paste(df$HH,df$HH_Head, sep=".")
df <- na.omit(object = df)

Now the fun part — plotting latitude and longitude on a map. For this you should have set up an API key on Google and registered it in R.

For mapping itself, I use a standard get_map function from ggmap package and will create two maps with similar functions.

sample.mapea <- get_map(location = c(lon = mean(df$lon), lat = mean(df$lat)), zoom = 17, source = "google" , maptype = "hybrid", scale = 1)

To this basic map, labels for each household that will be mapped out are now added. These labels represent names of household heads who occupy physical dwellings based on available data from the listing dataset. As more than 20 households can be selected from an enumeration area, displaying labels is split, so that using a mean, halo of the labels will be plotted on the left side of the map, while the other half will be on the right.

Also label need to be placed not to overlap, regardless of the side they will be plotted on, so geom_label_repel is used to distribute labels to distribute them among equally.

geom_point(data = df, aes(x = lon, y = lat, color= "white"), color="red", fill= "black", stroke = 1.5, shape = 21, alpha=0.9)+

geom_label_repel(data = subset(df, df$lon>=mean(df$lon)),
aes(x = lon, y = lat, label = HH_Head1),
fontface = "bold",

nudge_x = 0.02,

direction = "y",
box.padding = unit(0.081, 'lines'),
hjust = 1,

segment.size = 0.1,

arrow = arrow(length = unit(0.03, "npc"), type = "closed", ends = "first"),force=5) +

geom_label_repel(data = subset(df, df$lon<mean(df$lon)),
aes(x = lon, y = lat, label = HH_Head1),
fontface = "bold",

nudge_x = -0.7,

direction = "y",
box.padding = unit(0.081, 'lines'),
hjust = 0,

segment.size = 0.1,

arrow = arrow(length = unit(0.03, "npc"), type = "closed", ends = "first"),force=5) +

in the end I also added a compass arrow to this map so that an interviewer or a supervisor of the team can orientate easily in the field. This is just an additional beatification of the map.

Also added a a generic title for the first map here.

scale_fill_viridis_c(trans = "sqrt", alpha = .9)+annotation_north_arrow(location = "bl", which_north = "true", 
pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
style = north_arrow_fancy_orienteering) +
ggtitle("", subtitle = "(Cluster # 00001, Team # 001)")+

theme_void() +

theme(legend.position="none")

Also second map needs a title:

ggarrange(sample.mapea, sample.mapea_small + rremove("x.text"), 
labels = c("Location of Households in EA", "Location of the EA"),

ncol = 1, nrow = 2,align = "v")

In the end I create a PDF that is A4 size with both maps plotted for easy use by interviewers in the field:

ggsave(filename = "Cluster_map_sample.pdf", plot = last_plot(), device = NULL, path = NULL,
width = 8.3, height = 11.7,
dpi = "retina", limitsize = TRUE)

Final product would look something like this:

Hope this will help you in your mapping efforts.

Plot away!

--

--

Filip Mitrovic
Analytics Vidhya

Believer in the whole "Better data, better lives" thing. Wish I had a dog.