10.2 Scripts
If functions distributed in packages are the building blocks of R code, scripts are the glue that holds them together, in a logical order, to create reproducible workflows.To programming novices scripts may sound intimidating but they are simply plain text files, typically saved with an extension representing the language they contain.R scripts are generally saved with a .R
extension and named to reflect what they do.An example is 10-hello.R
, a script file stored in the code
folder of the book’s repository, which contains the following two lines of code:
# Aim: provide a minimal R script
print("Hello geocompr")
The lines of code may not be particularly exciting but they demonstrate the point: scripts do not need to be complicated.Saved scripts can be called and executed in their entirety with source()
, as demonstrated below which shows how the comment is ignored but the instruction is executed:
source("code/10-hello.R")
#> [1] "Hello geocompr"
There are no strict rules on what can and cannot go into script files and nothing to prevent you from saving broken, non-reproducible code.52There are, however, some conventions worth following:
- Write the script in order: just like the script of a film, scripts should have a clear order such as ‘setup’, ‘data processing’ and ‘save results’ (roughly equivalent to ‘beginning’, ‘middle’ and ‘end’ in a film).
- Add comments to the script so other people (and your future self) can understand it. At a minimum, a comment should state the purpose of the script (see Figure 10.1) and (for long scripts) divide it into sections. This can be done in RStudio, for example, with the shortcut
Ctrl+Shift+R
, which creates ‘foldable’ code section headings. - Above all, scripts should be reproducible: self-contained scripts that will work on any computer are more useful than scripts that only run on your computer, on a good day. This involves attaching required packages at the beginning, reading-in data from persistent sources (such as a reliable website) and ensuring that previous steps have been taken.53
It is hard to enforce reproducibility in R scripts, but there are tools that can help.By default, RStudio ‘code-checks’ R scripts and underlines faulty code with a red wavy line, as illustrated below:
Figure 10.1: Code checking in RStudio. This example, from the script 10-centroid-alg.R, highlights an unclosed curly bracket on line 19.
A useful tool for reproducibility is the reprex package.Its main function reprex()
tests lines of R code to check if they are reproducible, and provides markdown output to facilitate communication on sites such as GitHub.See the web page reprex.tidyverse.org for details.
The contents of this section apply to any type of R script.A particular consideration with scripts for geocomputation is that they tend to have external dependencies, such as the QGIS dependency to run code in Chapter 9, and require input data in a specific format.Such dependencies should be mentioned as comments in the script or elsewhere in the project of which it is a part, as illustrated in the script 10-centroid-alg.R
.The work undertaken by this script is demonstrated in the reproducible example below, which works on a pre-requisite object named poly_mat
, a square with sides 9 units in length (the meaning of this will become apparent in the next section):54
poly_mat = cbind(
x = c(0, 0, 9, 9, 0),
y = c(0, 9, 9, 0, 0)
)
source("https://git.io/10-centroid-alg.R") # short url
#> [1] "The area is: 81"#> [1] "The coordinates of the centroid are: 4.5, 4.5"