4.3 Spatial operations on raster data
This section builds on Section 3.3, which highlights various basic methods for manipulating raster datasets, to demonstrate more advanced and explicitly spatial raster operations, and uses the objects elev
and grain
manually created in Section 3.3.For the reader’s convenience, these datasets can be also found in the spData package.
4.3.1 Spatial subsetting
The previous chapter (Section 3.3) demonstrated how to retrieve values associated with specific cell IDs or row and column combinations.Raster objects can also be extracted by location (coordinates) and other spatial objects.To use coordinates for subsetting, one can ‘translate’ the coordinates into a cell ID with the raster function cellFromXY()
.An alternative is to use raster::extract()
(be careful, there is also a function called extract()
in the tidyverse) to extract values.Both methods are demonstrated below to find the value of the cell that covers a point located 0.1 units from the origin.
id = cellFromXY(elev, xy = c(0.1, 0.1))
elev[id]
# the same as
raster::extract(elev, data.frame(x = 0.1, y = 0.1))
It is convenient that both functions also accept objects of class Spatial* Objects
.Raster objects can also be subset with another raster object, as illustrated in Figure 4.7 (left panel) and demonstrated in the code chunk below:
clip = raster(xmn = 0.9, xmx = 1.8, ymn = -0.45, ymx = 0.45,
res = 0.3, vals = rep(1, 9))
elev[clip]
#> [1] 18 24
# we can also use extract
# extract(elev, extent(clip))
Basically, this amounts to retrieving the values of the first raster (here: elev
) falling within the extent of a second raster (here: clip
).
Figure 4.7: Subsetting raster values with the help of another raster (left). Raster mask (middle). Output of masking a raster (right).
So far, the subsetting returned the values of specific cells, however, when doing spatial subsetting, one often also expects a spatial object as an output.To do this, we can use again the [
when we additionally set the drop
parameter to FALSE
.To illustrate this, we retrieve the first two cells of elev
as an individual raster object.As mentioned in Section 3.3, the [
operator accepts various inputs to subset rasters and returns a raster object when drop = FALSE
.The code chunk below subsets the elev
raster by cell ID and row-column index with identical results: the first two cells on the top row (only the first 2 lines of the output is shown):
elev[1:2, drop = FALSE] # spatial subsetting with cell IDs
elev[1, 1:2, drop = FALSE] # spatial subsetting by row,column indices
#> class : RasterLayer
#> dimensions : 1, 2, 2 (nrow, ncol, ncell)
#> ...
Another common use case of spatial subsetting is when a raster with logical
(or NA
) values is used to mask another raster with the same extent and resolution, as illustrated in Figure 4.7, middle and right panel.In this case, the [
, mask()
and overlay()
functions can be used (results not shown):
# create raster mask
rmask = elev
values(rmask) = sample(c(NA, TRUE), 36, replace = TRUE)
# spatial subsetting
elev[rmask, drop = FALSE] # with [ operator
mask(elev, rmask) # with mask()
overlay(elev, rmask, fun = "max") # with overlay
In the code chunk above, we have created a mask object called rmask
with values randomly assigned to NA
and TRUE
.Next, we want to keep those values of elev
which are TRUE
in rmask
.In other words, we want to mask elev
with rmask
.These operations are in fact Boolean local operations since we compare cell-wise two rasters.The next subsection explores these and related operations in more detail.
4.3.2 Map algebra
Map algebra makes raster processing really fast.This is because raster datasets only implicitly store coordinates.To derive the coordinate of a specific cell, we have to calculate it using its matrix position and the raster resolution and origin.For the processing, however, the geographic position of a cell is barely relevant as long as we make sure that the cell position is still the same after the processing (one-to-one locational correspondence).Additionally, if two or more raster datasets share the same extent, projection and resolution, one could treat them as matrices for the processing.This is exactly what map algebra is doing in R.First, the raster package checks the headers of the rasters on which to perform any algebraic operation, and only if they are correspondent to each other, the processing goes on.18And secondly, map algebra retains the so-called one-to-one locational correspondence.This is where it substantially differs from matrix algebra which changes positions when for example multiplying or dividing matrices.
Map algebra (or cartographic modeling) divides raster operations into four subclasses (Tomlin 1990), with each working on one or several grids simultaneously:
- Local or per-cell operations.
- Focal or neighborhood operations.Most often the output cell value is the result of a 3 x 3 input cell block.
- Zonal operations are similar to focal operations, but the surrounding pixel grid on which new values are computed can have irregular sizes and shapes.
- Global or per-raster operations; that means the output cell derives its value potentially from one or several entire rasters.
This typology classifies map algebra operations by the number/shape of cells used for each pixel processing step.For the sake of completeness, we should mention that raster operations can also be classified by discipline such as terrain, hydrological analysis or image classification.The following sections explain how each type of map algebra operations can be used, with reference to worked examples (also seevignette("Raster")
for a technical description of map algebra).
4.3.3 Local operations
Local operations comprise all cell-by-cell operations in one or several layers.A good example is the classification of intervals of numeric values into groups such as grouping a digital elevation model into low (class 1), middle (class 2) and high elevations (class 3).Using the reclassify()
command, we need first to construct a reclassification matrix, where the first column corresponds to the lower and the second column to the upper end of the class.The third column represents the new value for the specified ranges in column one and two.Here, we assign the raster values in the ranges 0–12, 12–24 and 24–36 are reclassified to take values 1, 2 and 3, respectively.
rcl = matrix(c(0, 12, 1, 12, 24, 2, 24, 36, 3), ncol = 3, byrow = TRUE)
recl = reclassify(elev, rcl = rcl)
We will perform several reclassifactions in Chapter 13.
Raster algebra is another classical use case of local operations.This includes adding, subtracting and squaring two rasters.Raster algebra also allows logical operations such as finding all raster cells that are greater than a specific value (5 in our example below).The raster package supports all these operations and more, as described in vignette("Raster")
and demonstrated below (results not show):
elev + elev
elev^2
log(elev)
elev > 5
Instead of arithmetic operators, one can also use the calc()
and overlay()
functions.These functions are more efficient, hence, they are preferable in the presence of large raster datasets.Additionally, they allow you to directly store an output file.
The calculation of the normalized difference vegetation index (NDVI) is a well-known local (pixel-by-pixel) raster operation.It returns a raster with values between -1 and 1; positive values indicate the presence of living plants (mostly > 0.2).NDVI is calculated from red and near-infrared (NIR) bands of remotely sensed imagery, typically from satellite systems such as Landsat or Sentinel.Vegetation absorbs light heavily in the visible light spectrum, and especially in the red channel, while reflecting NIR light, explaining the NVDI formula:
[\begin{split}NDVI&= \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}}\\end{split}]
Predictive mapping is another interesting application of local raster operations.The response variable corresponds to measured or observed points in space, for example, species richness, the presence of landslides, tree disease or crop yield.Consequently, we can easily retrieve space- or airborne predictor variables from various rasters (elevation, pH, precipitation, temperature, landcover, soil class, etc.).Subsequently, we model our response as a function of our predictors using lm
, glm
, gam
or a machine-learning technique.Spatial predictions on raster objects can therefore be made by applying estimated coefficients to the predictor raster vaules, and summing the output raster values (see Chapter 14).
4.3.4 Focal operations
While local functions operate on one cell, though possibly from multiple layers, focal operations take into account a central cell and its neighbors.The neighborhood (also named kernel, filter or moving window) under consideration is typically of size 3-by-3 cells (that is the central cell and its eight surrounding neighbors), but can take on any other (not necessarily rectangular) shape as defined by the user.A focal operation applies an aggregation function to all cells within the specified neighborhood, uses the corresponding output as the new value for the the central cell, and moves on to the next central cell (Figure 4.8).Other names for this operation are spatial filtering and convolution (Burrough, McDonnell, and Lloyd 2015).
In R, we can use the focal()
function to perform spatial filtering.We define the shape of the moving window with a matrix
whose values correspond to weights (see w
parameter in the code chunk below).Secondly, the fun
parameter lets us specify the function we wish to apply to this neighborhood.Here, we choose the minimum, but any other summary function, including sum()
, mean()
, or var()
can be used.
r_focal = focal(elev, w = matrix(1, nrow = 3, ncol = 3), fun = min)
Figure 4.8: Input raster (left) and resulting output raster (right) due to a focal operation - finding the minimum value in 3-by-3 moving windows.
We can quickly check if the output meets our expectations.In our example, the minimum value has to be always the upper left corner of the moving window (remember we have created the input raster by row-wise incrementing the cell values by one starting at the upper left corner).In this example, the weighting matrix consists only of 1s, meaning each cell has the same weight on the output, but this can be changed.
Focal functions or filters play a dominant role in image processing.Low-pass or smoothing filters use the mean function to remove extremes.In the case of categorical data, we can replace the mean with the mode, which is the most common value.By contrast, high-pass filters accentuate features.The line detection Laplace and Sobel filters might serve as an example here.Check the focal()
help page for how to use them in R (this will also be used in the excercises at the end of this chapter).
Terrain processing, the calculation of topographic characteristics such as slope, aspect and flow directions, relies on focal functions.terrain()
can be used to calculate these metrics, although some terrain algorithms, including the Zevenbergen and Thorne method to compute slope, are not implemented in this raster function.Many other algorithms — including curvatures, contributing areas and wetness indices — are implemented in open source desktop geographic information system (GIS) software.Chapter 9 shows how to access such GIS functionality from within R.
4.3.5 Zonal operations
Zonal operations are similar to focal operations.The difference is that zonal filters can take on any shape instead of a predefined rectangular window.Our grain size raster is a good example (Figure 3.2) because the different grain sizes are spread in an irregular fashion throughout the raster.
To find the mean elevation for each grain size class, we can use the zonal()
command.This kind of operation is also known as zonal statistics in the GIS world.
z = zonal(elev, grain, fun = "mean") %>%
as.data.frame()
z
#> zone mean
#> 1 1 17.8
#> 2 2 18.5
#> 3 3 19.2
This returns the statistics for each category, here the mean altitude for each grain size class, and can be added to the attribute table of the ratified raster (see previous chapter).
4.3.6 Global operations and distances
Global operations are a special case of zonal operations with the entire raster dataset representing a single zone.The most common global operations are descriptive statistics for the entire raster dataset such as the minimum or maximum (see Section 3.3.2).Aside from that, global operations are also useful for the computation of distance and weight rasters.In the first case, one can calculate the distance from each cell to a specific target cell.For example, one might want to compute the distance to the nearest coast (see also raster::distance()
).We might also want to consider topography, that means, we are not only interested in the pure distance but would like also to avoid the crossing of mountain ranges when going to the coast.To do so, we can weight the distance with elevation so that each additional altitudinal meter ‘prolongs’ the Euclidean distance.Visibility and viewshed computations also belong to the family of global operations (in the exercises of Chapter 9, you will compute a viewshed raster).
Many map algebra operations have a counterpart in vector processing (Liu and Mason 2009).Computing a distance raster (zonal operation) while only considering a maximum distance (logical focal operation) is the equivalent to a vector buffer operation (Section 5.2.5).Reclassifying raster data (either local or zonal function depending on the input) is equivalent to dissolving vector data (Section 4.2.3).Overlaying two rasters (local operation), where one contains NULL
or NA
values representing a mask, is similar to vector clipping (Section 5.2.5).Quite similar to spatial clipping is intersecting two layers (Section 4.2.1).The difference is that these two layers (vector or raster) simply share an overlapping area (see Figure 5.8 for an example).However, be careful with the wording.Sometimes the same words have slightly different meanings for raster and vector data models.Aggregating in the case of vector data refers to dissolving polygons, while it means increasing the resolution in the case of raster data.In fact, one could see dissolving or aggregating polygons as decreasing the resolution.However, zonal operations might be the better raster equivalent compared to changing the cell resolution.Zonal operations can dissolve the cells of one raster in accordance with the zones (categories) of another raster using an aggregation function (see above).
4.3.7 Merging rasters
Suppose we would like to compute the NDVI (see Section 4.3.3), and additionally want to compute terrain attributes from elevation data for observations within a study area.Such computations rely on remotely sensed information.The corresponding imagery is often divided into scenes covering a specific spatial extent.Frequently, a study area covers more than one scene.In these cases we would like to merge the scenes covered by our study area.In the easiest case, we can just merge these scenes, that is put them side by side.This is possible with digital elevation data (SRTM, ASTER).In the following code chunk we first download the SRTM elevation data for Austria and Switzerland (for the country codes, see the raster function ccodes()
).In a second step, we merge the two rasters into one.
aut = getData("alt", country = "AUT", mask = TRUE)
ch = getData("alt", country = "CHE", mask = TRUE)
aut_ch = merge(aut, ch)
Raster’s merge()
command combines two images, and in case they overlap, it uses the value of the first raster.You can do exactly the same with gdalUtils::mosaic_rasters()
which is faster, and therefore recommended if you have to merge a multitude of large rasters stored on disk.
The merging approach is of little use when the overlapping values do not correspond to each other.This is frequently the case when you want to combine spectral imagery from scenes that were taken on different dates.The merge()
command will still work but you will see a clear border in the resulting image.The mosaic()
command lets you define a function for the overlapping area.For instance, we could compute the mean value.This might smooth the clear border in the merged result but it will most likely not make it disappear.To do so, we need a more advanced approach.Remote sensing scientists frequently apply histogram matching or use regression techniques to align the values of the first image with those of the second image.The packages landsat (histmatch()
, relnorm()
, PIF()
), satellite (calcHistMatch()
) and RStoolbox (histMatch()
, pifMatch()
) provide the corresponding functions.For a more detailed introduction on how to use R for remote sensing, we refer the reader to Wegmann, Leutner, and Dech (2016).