15.3 Gaps and overlaps
There are a number of gaps in, and some overlaps between, the topics covered in this book.We have been selective, emphasizing some topics while omitting others.We have tried to emphasize topics that are most commonly needed in real-world applications such as geographic data operations, projections, data read/write and visualization.These topics appear repeatedly in the chapters, a substantial area of overlap designed to consolidate these essential skills for geocomputation.
On the other hand, we have omitted topics that are less commonly used, or which are covered in-depth elsewhere.Statistical topics including point pattern analysis, spatial interpolation (kriging) and spatial epidemiology, for example, are only mentioned with reference to other topics such as the machine learning techniques covered in Chapter 11 (if at all).There is already excellent material on these methods, including statistically orientated chapters in Bivand, Pebesma, and Gómez-Rubio (2013) and a book on point pattern analysis by Baddeley, Rubak, and Turner (2015).Other topics which received limited attention were remote sensing and using R alongside (rather than as a bridge to) dedicated GIS software.There are many resources on these topics, including Wegmann, Leutner, and Dech (2016) and the GIS-related teaching materials available from Marburg University.
Instead of covering spatial statistical modeling and inference techniques, we focussed on machine learning (see Chapters 11 and 14).Again, the reason was that there are already excellent resources on these topics, especially with ecological use cases, including Zuur et al. (2009), Zuur et al. (2017) and freely available teaching material and code on Geostatistics & Open-source Statistical Computing by David Rossiter, hosted at css.cornell.edu/faculty/dgr2.There are also excellent resources on spatial statistics using Bayesian modeling, a powerful framework for modeling and uncertainty estimation (Blangiardo and Cameletti 2015; Krainski et al. 2018).
Finally, we have largely omitted big data analytics.This might seem surprising since especially geographic data can become big really fast.But the prerequisite for doing big data analytics is to know how to solve a problem on a small dataset.Once you have learned that, you can apply the exact same techniques on big data questions, though of course you need to expand your toolbox.The first thing to learn is to handle geographic data queries.This is because big data analytics often boil down to extracting a small amount of data from a database for a specific statistical analysis.For this, we have provided an introduction to spatial databases and how to use a GIS from within R in Chapter 9.If you really have to do the analysis on a big or even the complete dataset, hopefully, the problem you are trying to solve is embarrassingly parallel.For this, you need to learn a system that is able to do this parallelization efficiently such as Hadoop, GeoMesa (http://www.geomesa.org/) or GeoSpark (Huang et al. 2017).But still, you are applying the same techniques and concepts you have used on small datasets to answer a big data question, the only difference is that you then do it in a big data setting.