15.2 Package choice
A characteristic of R is that there are often multiple ways to achieve the same result.The code chunk below illustrates this by using three functions, covered in Chapters 3 and 5, to combine the 16 regions of New Zealand into a single geometry:
library(spData)
nz_u1 = sf::st_union(nz)
nz_u2 = aggregate(nz["Population"], list(rep(1, nrow(nz))), sum)
nz_u3 = dplyr::summarise(nz, t = sum(Population))
identical(nz_u1, nz_u2$geometry)
#> [1] TRUE
identical(nz_u1, nz_u3$geom)
#> [1] TRUE
Although the classes, attributes and column names of the resulting objects nz_u1
to nz_u3
differ, their geometries are identical.This is verified using the base R function identical()
.79Which to use?It depends: the former only processes the geometry data contained in nz
so is faster, while the other options performed attribute operations, which may be useful for subsequent steps.
The wider point is that there are often multiple options to choose from when working with geographic data in R, even within a single package.The range of options grows further when more R packages are considered: you could achieve the same result using the older sp package, for example.We recommend using sf and the other packages showcased in this book, for reasons outlined in Chapter 2, but it’s worth being aware of alternatives and being able to justify your choice of software.
A common (and sometimes controversial) choice is between tidyverse and base R approaches.We cover both and encourage you to try both before deciding which is more appropriate for different tasks.The following code chunk, described in Chapter 3, shows how attribute data subsetting works in each approach, using the base R operator [
and the select()
function from the tidyverse package dplyr.The syntax differs but the results are (in essence) the same:
library(dplyr) # attach tidyverse package
nz_name1 = nz["Name"] # base R approach
nz_name2 = nz %>% select(Name) # tidyverse approach
identical(nz_name1$Name, nz_name2$Name) # check results
#> [1] TRUE
Again the question arises: which to use?Again the answer is: it depends.Each approach has advantages: the pipe syntax is popular and appealing to some, while base R is more stable, and is well known to others.Choosing between them is therefore largely a matter of preference.However, if you do choose to use tidyverse functions to handle geographic data, beware of a number of pitfalls (see the supplementary article tidyverse-pitfalls
on the website that supports this book).
While commonly needed operators/functions were covered in depth — such as the base R [
subsetting operator and the dplyr function filter()
— there are many other functions for working with geographic data, from other packages, that have not been mentioned.Chapter 1 mentions 20+ influential packages for working with geographic data, and only a handful of these are demonstrated in subsequent chapters.There are hundreds more.As of early 2019, there are nearly 200 packages mentioned in the Spatial Task View;more packages and countless functions for geographic data are developed each year, making it impractical to cover them all in a single book.
The rate of evolution in R’s spatial ecosystem may seem overwhelming, but there are strategies to deal with the wide range of options.Our advice is to start by learning one approach in depth but to have a general understand of the breadth of options available.This advice applies equally to solving geographic problems in R (Section 15.4 covers developments in other languages) as it does to other fields of knowledge and application.
Of course, some packages perform much better than others, making package selection an important decision.From this diversity, we have focused on packages that are future-proof (they will work long into the future), high performance (relative to other R packages) and complementary.But there is still overlap in the packages we have used, as illustrated by the diversity of packages for making maps, for example (see Chapter 8).
Package overlap is not necessarily a bad thing.It can increase resilience, performance (partly driven by friendly competition and mutual learning between developers) and choice, a key feature of open source software.In this context the decision to use a particular approach, such as the sf/tidyverse/raster ecosystem advocated in this book should be made with knowledge of alternatives.The sp/rgdal/rgeos ecosystem that sf is designed to supersede, for example, can do many of the things covered in this book and, due to its age, is built on by many other packages.80Although best known for point pattern analysis, the spatstat package also supports raster and other vector geometries (Baddeley and Turner 2005).At the time of writing (October 2018) 69 packages depend on it, making it more than a package: spatstat is an alternative R-spatial ecosystem.
It is also being aware of promising alternatives that are under development.The package stars, for example, provides a new class system for working with spatiotemporal data.If you are interested in this topic, you can check for updates on the package’s source code and the broader SpatioTemporal Task View.The same principle applies to other domains: it is important to justify software choices and review software decisions based on up-to-date information.