13 Geomarketing - 13.3 Tidy the input data - 《[英文] Geocomputation with R》

13.3 Tidy the input data

13.3 Tidy the input data

The German government provides gridded census data at either 1 km or 100 m resolution.The following code chunk downloads, unzips and reads in the 1 km data.

download.file("https://tinyurl.com/ybtpkwxz", 
              destfile = "census.zip", mode = "wb")
unzip("census.zip") # unzip the files
census_de = readr::read_csv2(list.files(pattern = "Gitter.csv"))

As a convenience to the reader, the corresponding data has been put into spDataLarge and can be accessed as follows

data("census_de", package = "spDataLarge")

The census_de object is a data frame containing 13 variables for more than 300,000 grid cells across Germany.For our work, we only need a subset of these: Easting (x) and Northing (y), number of inhabitants (population; pop), mean average age (mean_age), proportion of women (women) and average household size (hh_size).These variables are selected and renamed from German into English in the code chunk below and summarized in Table 13.1.Further, mutate_all() is used to convert values -1 and -9 (meaning unknown) to NA.

# pop = population, hh_size = household size
input = dplyr::select(census_de, x = x_mp_1km, y = y_mp_1km, pop = Einwohner,
                      women = Frauen_A, mean_age = Alter_D,
                      hh_size = HHGroesse_D)
# set -1 and -9 to NA
input_tidy = mutate_all(input, list(~ifelse(. %in% c(-1, -9), NA, .)))

Table 13.1: Categories for each variable in census data from Datensatzbeschreibung…xlsx located in the downloaded file census.zip (see Figure 13.1 for their spatial distribution).
class	Population	% female	Mean age	Household size
1	3-250	0-40	0-40	1-2
2	250-500	40-47	40-42	2-2.5
3	500-2000	47-53	42-44	2.5-3
4	2000-4000	53-60	44-47	3-3.5
5	4000-8000	>60	>47	>3.5
6	>8000