9.6 Other bridges
The focus of this chapter is on R interfaces to Desktop GIS software.We emphasize these bridges because dedicated GIS software is well-known and a common ‘way in’ to understanding geographic data.They also provide access to many geoalgorithms.
Other ‘bridges’ include interfaces to spatial libraries (Section 9.6.1 shows how to access the GDAL CLI from R), spatial databases (see Section 9.6.2) and web mapping services (see Chapter 8).This section provides only a snippet of what is possible.Thanks to R’s flexibility, with its ability to call other programs from the system and integration with other languages (notably via Rcpp and reticulate), many other bridges are possible.The aim is not to be comprehensive, but to demonstrate other ways of accessing the ‘flexibility and power’ in the quote by Sherman (2008) at the beginning of the chapter.
9.6.1 Bridges to GDAL
As discussed in Chapter 7, GDAL is a low-level library that supports many geographic data formats.GDAL is so effective that most GIS programs use GDAL in the background for importing and exporting geographic data, rather than re-inventing the wheel and using bespoke read-write code.But GDAL offers more than data I/O.It has geoprocessing tools for vector and raster data, functionality to create tiles for serving raster data online, and rapid rasterization of vector data, all of which can be accessed via the system of R command line.
The code chunk below demonstrates this functionality:linkGDAL()
searches the computer for a working GDAL installation and adds the location of the executable files to the PATH variable, allowing GDAL to be called.In the example below ogrinfo
provides metadata on a vector dataset:
link2GI::linkGDAL()
cmd = paste("ogrinfo -ro -so -al", system.file("shape/nc.shp", package = "sf"))
system(cmd)
#> INFO: Open of `C:/Users/geocompr/Documents/R/win-library/3.5/sf/shape/nc.shp'
#> using driver `ESRI Shapefile' successful.
#>
#> Layer name: nc
#> Metadata:
#> DBF_DATE_LAST_UPDATE=2016-10-26
#> Geometry: Polygon
#> Feature Count: 100
#> Extent: (-84.323853, 33.881992) - (-75.456978, 36.589649)
#> Layer SRS WKT:
#> ...
This example — which returns the same result as rgdal::ogrInfo()
— may be simple, but it shows how to use GDAL via the system command-line, independently of other packages.The ‘link’ to GDAL provided by link2gi could be used as a foundation for doing more advanced GDAL work from the R or system CLI.45TauDEM (http://hydrology.usu.edu/taudem/taudem5/index.html) and the Orfeo Toolbox (https://www.orfeo-toolbox.org/) are other spatial data processing libraries/programs offering a command line interface.At the time of writing, it appears that there is only a developer version of an R/TauDEM interface on R-Forge (https://r-forge.r-project.org/R/?group_id=956).In any case, the above example shows how to access these libraries from the system command line via R.This in turn could be the starting point for creating a proper interface to these libraries in the form of new R packages.
Before diving into a project to create a new bridge, however, it is important to be aware of the power of existing R packages and that system()
calls may not be platform-independent (they may fail on some computers).Furthermore, sf brings most of the power provided by GDAL, GEOS and PROJ to R via the R/C++ interface provided by Rcpp, which avoids system()
calls.
9.6.2 Bridges to spatial databases
Spatial database management systems (spatial DBMS) store spatial and non-spatial data in a structured way.They can organize large collections of data into related tables (entities) via unique identifiers (primary and foreign keys) and implicitly via space (think for instance of a spatial join).This is useful because geographic datasets tend to become big and messy quite quickly.Databases enable storing and querying large datasets efficiently based on spatial and non-spatial fields, and provide multi-user access and topology support.
The most important open source spatial database is PostGIS (Obe and Hsu 2015).46R bridges to spatial DBMSs such as PostGIS are important, allowing access to huge data stores without loading several gigabytes of geographic data into RAM, and likely crashing the R session.The remainder of this section shows how PostGIS can be called from R, based on “Hello real word” from PostGIS in Action, Second Edition(Obe and Hsu 2015).47
The subsequent code requires a working internet connection, since we are accessing a PostgreSQL/PostGIS database which is living in the QGIS Cloud (https://qgiscloud.com/).48
library(RPostgreSQL)
conn = dbConnect(drv = PostgreSQL(), dbname = "rtafdf_zljbqm",
host = "db.qgiscloud.com",
port = "5432", user = "rtafdf_zljbqm",
password = "d3290ead")
Often the first question is, ‘which tables can be found in the database?’.This can be asked as follows (the answer is 5 tables):
dbListTables(conn)
#> [1] "spatial_ref_sys" "topology" "layer" "restaurants"
#> [5] "highways"
We are only interested in the restaurants
and the highways
tables.The former represents the locations of fast-food restaurants in the US, and the latter are principal US highways.To find out about attributes available in a table, we can run:
dbListFields(conn, "highways")
#> [1] "qc_id" "wkb_geometry" "gid" "feature"
#> [5] "name" "state"
The first query will select US Route 1
in Maryland (MD
).Note that st_read()
allows us to read geographic data from a database if it is provided with an open connection to a database and a query.Additionally, st_read()
needs to know which column represents the geometry (here: wkb_geometry
).
query = paste(
"SELECT *",
"FROM highways",
"WHERE name = 'US Route 1' AND state = 'MD';")
us_route = st_read(conn, query = query, geom = "wkb_geometry")
This results in an sf-object named us_route
of type sfc_MULTILINESTRING
.The next step is to add a 20-mile buffer (corresponds to 1609 meters times 20) around the selected highway (Figure 9.4).
query = paste(
"SELECT ST_Union(ST_Buffer(wkb_geometry, 1609 * 20))::geometry",
"FROM highways",
"WHERE name = 'US Route 1' AND state = 'MD';")
buf = st_read(conn, query = query)
Note that this was a spatial query using functions (ST_Union()
, ST_Buffer()
) you should be already familiar with since you find them also in the sf-package, though here they are written in lowercase characters (st_union()
, st_buffer()
).In fact, function names of the sf package largely follow the PostGIS naming conventions.49The last query will find all Hardee restaurants (HDE
) within the buffer zone (Figure 9.4).
query = paste(
"SELECT r.wkb_geometry",
"FROM restaurants r",
"WHERE EXISTS (",
"SELECT gid",
"FROM highways",
"WHERE",
"ST_DWithin(r.wkb_geometry, wkb_geometry, 1609 * 20) AND",
"name = 'US Route 1' AND",
"state = 'MD' AND",
"r.franchise = 'HDE');"
)
hardees = st_read(conn, query = query)
Please refer to Obe and Hsu (2015) for a detailed explanation of the spatial SQL query.Finally, it is good practice to close the database connection as follows:50
RPostgreSQL::postgresqlCloseConnection(conn)
Figure 9.4: Visualization of the output of previous PostGIS commands showing the highway (black line), a buffer (light yellow) and three restaurants (light blue points) within the buffer.
Unlike PostGIS, sf only supports spatial vector data.To query and manipulate raster data stored in a PostGIS database, use the rpostgis package (Bucklin and Basille 2018) and/or use command-line tools such as rastertopgsql
which comes as part of the PostGIS installation.
This subsection is only a brief introduction to PostgreSQL/PostGIS.Nevertheless, we would like to encourage the practice of storing geographic and non-geographic data in a spatial DBMS while only attaching those subsets to R’s global environment which are needed for further (geo-)statistical analysis.Please refer to Obe and Hsu (2015) for a more detailed description of the SQL queries presented and a more comprehensive introduction to PostgreSQL/PostGIS in general.PostgreSQL/PostGIS is a formidable choice as an open-source spatial database.But the same is true for the lightweight SQLite/SpatiaLite database engine and GRASS which uses SQLite in the background (see Section 9.4).
As a final note, if your data is getting too big for PostgreSQL/PostGIS and you require massive spatial data management and query performance, then the next logical step is to use large-scale geographic querying on distributed computing systems, as for example, provided by GeoMesa (http://www.geomesa.org/) or GeoSpark (http://geospark.datasyslab.org/; Huang et al. 2017).