12.3 Transport zones
Although transport systems are primarily based on linear features and nodes — including pathways and stations — it often makes sense to start with areal data, to break continuous space into tangible units (Hollander 2016).In addition to the boundary defining the study area (Bristol in this case), two zone types are of particular interest to transport researchers: origin and destination zones.Often, the same geographic units are used for origins and destinations.However, different zoning systems, such as ‘Workplace Zones’, may be appropriate to represent the increased density of trip destinations in areas with many ‘trip attractors’ such as schools and shops (Office for National Statistics 2014).
The simplest way to define a study area is often the first matching boundary returned by OpenStreetMap, which can be obtained using osmdata with a command such as bristol_region = osmdata::getbb("Bristol", format_out = "sf_polygon")
. This results in an sf
object representing the bounds of the largest matching city region, either a rectangular polygon of the bounding box or a detailed polygonal boundary.68For Bristol, UK, a detailed polygon is returned, representing the official boundary of Bristol (see the inner blue boundary in Figure 12.1) but there are a couple of issues with this approach:
- The first OSM boundary returned by OSM may not be the official boundary used by local authorities.
- Even if OSM returns the official boundary, this may be inappropriate for transport research because they bear little relation to where people travel.
Travel to Work Areas (TTWAs) address these issues by creating a zoning system analogous to hydrological watersheds.TTWAs were first defined as contiguous zones within which 75% of the population travels to work (Coombes, Green, and Openshaw 1986), and this is the definition used in this chapter.Because Bristol is a major employer attracting travel from surrounding towns, its TTWA is substantially larger than the city bounds (see Figure 12.1).The polygon representing this transport-orientated boundary is stored in the objectbristol_ttwa
, provided by the spDataLarge package loaded at the beginning of this chapter.
The origin and destination zones used in this chapter are the same: officially defined zones of intermediate geographic resolution (their official name is Middle layer Super Output Areas or MSOAs).Each houses around 8,000 people.Such administrative zones can provide vital context to transport analysis, such as the type of people who might benefit most from particular interventions (e.g., Moreno-Monroy, Lovelace, and Ramos 2017).
The geographic resolution of these zones is important: small zones with high geographic resolution are usually preferable but their high number in large regions can have consequences for processing (especially for origin-destination analysis in which the number of possibilities increases as a non-linear function of the number of zones) (Hollander 2016).
Another issue with small zones is related to anonymity rules. To make it impossible to infer the identity of individuals in zones, detailed socio-demographic variables are often only available at a low geographic resolution. Breakdowns of travel mode by age and sex, for example, are available at the Local Authority level in the UK, but not at the much higher Output Area level, each of which contains around 100 households. For further details, see www.ons.gov.uk/methodology/geography.
The 102 zones used in this chapter are stored in bristol_zones
, as illustrated in Figure 12.2.Note the zones get smaller in densely populated areas: each houses a similar number of people.bristol_zones
contains no attribute data on transport, however, only the name and code of each zone:
names(bristol_zones)
#> [1] "geo_code" "name" "geometry"
To add travel data, we will undertake an attribute join, a common task described in Section 3.2.3.We will use travel data from the UK’s 2011 census question on travel to work, data stored in bristolod
, which was provided by the ons.gov.uk data portal.bristol_od
is an origin-destination (OD) dataset on travel to work between zones from the UK’s 2011 Census (see Section 12.4).The first column is the ID of the zone of origin and the second column is the zone of destination.bristol_od
has more rows than bristol_zones
, representing travel _between zones rather than the zones themselves:
nrow(bristol_od)
#> [1] 2910
nrow(bristol_zones)
#> [1] 102
The results of the previous code chunk shows that there are more than 10 OD pairs for every zone, meaning we will need to aggregate the origin-destination data before it is joined with bristol_zones
, as illustrated below (origin-destination data is described in Section 12.4):
zones_attr = bristol_od %>%
group_by(o) %>%
summarize_if(is.numeric, sum) %>%
dplyr::rename(geo_code = o)
The preceding chunk performed three main steps:
- Grouped the data by zone of origin (contained in the column
o
). - Aggregated the variables in the
bristolod
dataset _if they were numeric, to find the total number of people living in each zone by mode of transport.69 - Renamed the grouping variable
o
so it matches the ID columngeo_code
in thebristol_zones
object.
The resulting objectzones_attr
is a data frame with rows representing zones and an ID variable.We can verify that the IDs match those in thezones
dataset using the%in%
operator as follows:
summary(zones_attr$geo_code %in% bristol_zones$geo_code)
#> Mode TRUE
#> logical 102
The results show that all 102 zones are present in the new object and that zone_attr
is in a form that can be joined onto the zones.70This is done using the joining function left_join()
(note that inner_join()
would produce here the same result):
zones_joined = left_join(bristol_zones, zones_attr, by = "geo_code")
sum(zones_joined$all)
#> [1] 238805
names(zones_joined)
#> [1] "geo_code" "name" "all" "bicycle" "foot"
#> [6] "car_driver" "train" "geometry"
The result is zones_joined
, which contains new columns representing the total number of trips originating in each zone in the study area (almost 1/4 of a million) and their mode of travel (by bicycle, foot, car and train).The geographic distribution of trip origins is illustrated in the left-hand map in Figure 12.2.This shows that most zones have between 0 and 4,000 trips originating from them in the study area.More trips are made by people living near the center of Bristol and fewer on the outskirts.Why is this? Remember that we are only dealing with trips within the study region:low trip numbers in the outskirts of the region can be explained by the fact that many people in these peripheral zones will travel to other regions outside of the study area.Trips outside the study region can be included in regional model by a special destination ID covering any trips that go to a zone not represented in the model (Hollander 2016).The data in bristol_od
, however, simply ignores such trips: it is an ‘intra-zonal’ model.
In the same way that OD datasets can be aggregated to the zone of origin, they can also be aggregated to provide information about destination zones.People tend to gravitate towards central places.This explains why the spatial distribution represented in the right panel in Figure 12.2 is relatively uneven, with the most common destination zones concentrated in Bristol city center.The result is zones_od
, which contains a new column reporting the number of trip destinations by any mode, is created as follows:
zones_od = bristol_od %>%
group_by(d) %>%
summarize_if(is.numeric, sum) %>%
dplyr::select(geo_code = d, all_dest = all) %>%
inner_join(zones_joined, ., by = "geo_code")
A simplified version of Figure 12.2 is created with the code below (see 12-zones.R
in the code
folder of the book’s GitHub repo to reproduce the figure and Section 8.2.6 for details on faceted maps with tmap):
qtm(zones_od, c("all", "all_dest")) +
tm_layout(panel.labels = c("Origin", "Destination"))
Figure 12.2: Number of trips (commuters) living and working in the region. The left map shows zone of origin of commute trips; the right map shows zone of destination (generated by the script 12-zones.R).