15.5 The open source approach

This is a technical book so it makes sense for the next steps, outlined in the previous section, to also be technical.However, there are wider issues worth considering in this final section, which returns to our definition of geocomputation.One of the elements of the term introduced in Chapter 1 was that geographic methods should have a positive impact.Of course, how to define and measure ‘positive’ is a subjective, philosophical question, beyond the scope of this book.Regardless of your worldview, consideration of the impacts of geocomputational work is a useful exercise:the potential for positive impacts can provide a powerful motivation for future learning and, conversely, new methods can open-up many possible fields of application.These considerations lead to the conclusion that geocomputation is part of a wider ‘open source approach’.

Section 1.1 presented other terms that mean roughly the same thing as geocomputation, including geographic data science (GDS) and ‘GIScience’.Both capture the essence of working with geographic data, but geocomputation has advantages: it concisely captures the ‘computational’ way of working with geographic data advocated in this book — implemented in code and therefore encouraging reproducibility — and builds on desirable ingredients of its early definition (Openshaw and Abrahart 2000):

  • The creative use of geographic data.
  • Application to real-world problems.
  • Building ‘scientific’ tools.
  • Reproducibility.
    We added the final ingredient: reproducibility was barely mentioned in early work on geocompuation, yet a strong case can be made for it being a vital component of the first two ingredients.Reproducibility:

  • Encourages creativity by encouraging the focus to shift away from the basics (which are readily available through shared code) and towards applications.

  • Discourages people from ‘reinventing the wheel’: there is no need to re-do what others have done if their methods can be used by others.
  • Makes academic research more conducive to real world applications, by methods developed for one purpose (perhaps purely academic) can be used for practical applications.
    If reproducibility is the defining feature of geocomputation (or command-line GIS, code-driven geographic data analysis, or any other synonym for the same thing) it is worth considering what makes it reproducible.This brings us to the ‘open source approach’, which has three main components:

  • A command-line interface (CLI), encouraging scripts recording geographic work to be shared and reproduced.

  • Open source software, which can be inspected and potentially improved by anyone in the world.
  • An active developer community, which collaborates and self-organizes to build complementary and modular tools.
    Like the term geocomputation, the open source approach is more than a technical entity.It is a community composed of people interacting daily with shared aims: to produce high performance tools, free from commercial or legal restrictions, that are accessible for anyone to use.The open source approach to working with geographic data has advantages that transcend the technicalities of how the software works, encouraging learning, collaboration and an efficient division of labor.

There are many ways to engage in this community, especially with the emergence of code hosting sites, such as GitHub, which encourage communication and collaboration.A good place to start is simply browsing through some of the source code, ‘issues’ and ‘commits’ in a geographic package of interest.A quick glance at the r-spatial/sf GitHub repository, which hosts the code underlying the sf package, shows that 40+ people have contributed to the codebase and documentation.Dozens more people have contributed by asking question and by contributing to ‘upstream’ packages that sf uses.More than 600 issues have been closed on its issue tracker, representing a huge amount of work to make sf faster, more stable and user-friendly.This example, from just one package out of dozens, shows the scale of the intellectual operation underway to make R a highly effective and continuously evolving language for geocomputation.

It is instructive to watch the incessant development activity happen in public fora such as GitHub, but it is even more rewarding to become an active participant.This is one of the greatest features of the open source approach: it encourages people to get involved.This book itself is a result of the open source approach:it was motivated by the amazing developments in R’s geographic capabilities over the last two decades, but made practically possible by dialogue and code sharing on platforms for collaboration.We hope that in addition to disseminating useful methods for working with geographic data, this book inspires you to take a more open source approach.Whether it’s raising a constructive issue alerting developers to problems in their package; making the work done by you and the organizations you work for open; or simply helping other people by passing on the knowledge you’ve learned, getting involved can be a rewarding experience.

References

Baddeley, Adrian, and Rolf Turner. 2005. “Spatstat: An R Package for Analyzing Spatial Point Patterns.” Journal of Statistical Software 12 (6): 1–42.

Bivand, Roger, Edzer J Pebesma, and Virgilio Gómez-Rubio. 2013. Applied Spatial Data Analysis with R. Vol. 747248717. Springer.

Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns: Methodology and Applications with R. CRC Press.

Wegmann, Martin, Benjamin Leutner, and Stefan Dech, eds. 2016. Remote Sensing and GIS for Ecologists: Using Open Source Software. Data in the Wild. Exeter: Pelagic Publishing.

Zuur, Alain, Elena N. Ieno, Neil Walker, Anatoly A. Saveliev, and Graham M. Smith. 2009. Mixed Effects Models and Extensions in Ecology with R. Statistics for Biology and Health. New York: Springer-Verlag.

Zuur, Alain F., Elena N. Ieno, Anatoly A. Saveliev, and Alain F. Zuur. 2017. Beginner’s Guide to Spatial, Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA. Vol. 1. Newburgh, United Kingdom: Highland Statistics Ltd.

Blangiardo, Marta, and Michela Cameletti. 2015. Spatial and Spatio-Temporal Bayesian Models with R-INLA. Chichester, UK: John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118950203.

Krainski, Elias, Virgilio Gómez Rubio, Haakon Bakka, Amanda Lenzi, Daniela Castro-Camilo, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2018. Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA.

Huang, Zhou, Yiran Chen, Lin Wan, and Xia Peng. 2017. “GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark.” ISPRS International Journal of Geo-Information 6 (9): 285. https://doi.org/10.3390/ijgi6090285.

Wickham, Hadley. 2014a. Advanced R. CRC Press.

Chambers, John M. 2016. Extending R. CRC Press.

Garrard, Chris. 2016. Geoprocessing with Python. Shelter Island, NY: Manning Publications.

Brus, D. J. 2018. “Sampling for Digital Soil Mapping: A Tutorial Supported by R Scripts.” Geoderma, August. https://doi.org/10.1016/j.geoderma.2018.07.036.

Openshaw, Stan, and Robert J. Abrahart, eds. 2000. Geocomputation. London ; New York: CRC Press.


  • The first operation, undertaken by the function stunion(), creates an object of class sfc (a simple feature column).The latter two operations create sf objects, each of which _contains a simple feature column.Therefore, it is the geometries contained in simple feature columns, not the objects themselves, that are identical.

  • At the time of writing 452 package Depend or Import sp, showing that its data structures are widely used and have been extended in many directions.The equivalent number for sf was 69 in October 2018; with the growing popularity of sf, this is set to grow.

  • R’s strengths relevant to our definition of geocomputation include its emphasis on scientific reproducibility, widespread use in academic research and unparalleled support for statistical modeling of geographic data.Furthermore, we advocate learning one language (R) for geocomputation in depth before delving into other languages/frameworks because of the costs associated with context switching.It is preferable to have expertise in one language than basic knowledge of many.