14 Ecology - 14.3 Reducing dimensionality - 《[英文] Geocomputation with R》

14.3 Reducing dimensionality

14.3 Reducing dimensionality

Ordinations are a popular tool in vegetation science to extract the main information, frequently corresponding to ecological gradients, from large species-plot matrices mostly filled with 0s.However, they are also used in remote sensing, the soil sciences, geomarketing and many other fields.If you are unfamiliar with ordination techniques or in need of a refresher, have a look at Michael W. Palmer’s web page for a short introduction to popular ordination techniques in ecology and at Borcard, Gillet, and Legendre (2011) for a deeper look on how to apply these techniques in R.vegan’s package documentation is also a very helpful resource (vignette(package = "vegan")).

Principal component analysis (PCA) is probably the most famous ordination technique.It is a great tool to reduce dimensionality if one can expect linear relationships between variables, and if the joint absence of a variable (for example calcium) in two plots (observations) can be considered a similarity.This is barely the case with vegetation data.

For one, relationships are usually non-linear along environmental gradients.That means the presence of a plant usually follows a unimodal relationship along a gradient (e.g., humidity, temperature or salinity) with a peak at the most favorable conditions and declining ends towards the unfavorable conditions.

Secondly, the joint absence of a species in two plots is hardly an indication for similarity.Suppose a plant species is absent from the driest (e.g., an extreme desert) and the most moistest locations (e.g., a tree savanna) of our sampling.Then we really should refrain from counting this as a similarity because it is very likely that the only thing these two completely different environmental settings have in common in terms of floristic composition is the shared absence of species (except for rare ubiquitous species).

Non-metric multidimensional scaling (NMDS) is one popular dimension-reducing technique in ecology (von Wehrden et al. 2009).NMDS reduces the rank-based differences between the distances between objects in the original matrix and distances between the ordinated objects.The difference is expressed as stress.The lower the stress value, the better the ordination, i.e., the low-dimensional representation of the original matrix.Stress values lower than 10 represent an excellent fit, stress values of around 15 are still good, and values greater than 20 represent a poor fit (McCune, Grace, and Urban 2002).In R, metaMDS() of the vegan package can execute a NMDS.As input, it expects a community matrix with the sites as rows and the species as columns.Often ordinations using presence-absence data yield better results (in terms of explained variance) though the prize is, of course, a less informative input matrix (see also Exercises).decostand() converts numerical observations into presences and absences with 1 indicating the occurrence of a species and 0 the absence of a species.Ordination techniques such as NMDS require at least one observation per site.Hence, we need to dismiss all sites in which no species were found.

# presence-absence matrix
pa = decostand(comm, "pa")  # 100 rows (sites), 69 columns (species)
# keep only sites in which at least one species was found
pa = pa[rowSums(pa) != 0, ]  # 84 rows, 69 columns

The resulting output matrix serves as input for the NMDS.k specifies the number of output axes, here, set to 4.⁷⁸NMDS is an iterative procedure trying to make the ordinated space more similar to the input matrix in each step.To make sure that the algorithm converges, we set the number of steps to 500 (try parameter).

set.seed(25072018)
nmds = metaMDS(comm = pa, k = 4, try = 500)
nmds$stress
#> ...
#> Run 498 stress 0.08834745 
#> ... Procrustes: rmse 0.004100446  max resid 0.03041186 
#> Run 499 stress 0.08874805 
#> ... Procrustes: rmse 0.01822361  max resid 0.08054538 
#> Run 500 stress 0.08863627 
#> ... Procrustes: rmse 0.01421176  max resid 0.04985418 
#> *** Solution reached
#> 0.08831395

A stress value of 9 represents a very good result, which means that the reduced ordination space represents the large majority of the variance of the input matrix.Overall, NMDS puts objects that are more similar (in terms of species composition) closer together in ordination space.However, as opposed to most other ordination techniques, the axes are arbitrary and not necessarily ordered by importance (Borcard, Gillet, and Legendre 2011).However, we already know that humidity represents the main gradient in the study area (Muenchow, Bräuning, et al. 2013; Muenchow, Schratz, and Brenning 2017).Since humidity is highly correlated with elevation, we rotate the NMDS axes in accordance with elevation (see also ?MDSrotate for more details on rotating NMDS axes).Plotting the result reveals that the first axis is, as intended, clearly associated with altitude (Figure 14.3).

elev = dplyr::filter(random_points, id %in% rownames(pa)) %>% 
  dplyr::pull(dem)
# rotating NMDS in accordance with altitude (proxy for humidity)
rotnmds = MDSrotate(nmds, elev)
# extracting the first two axes
sc = scores(rotnmds, choices = 1:2)
# plotting the first axis against altitude
plot(y = sc[, 1], x = elev, xlab = "elevation in m", 
     ylab = "First NMDS axis", cex.lab = 0.8, cex.axis = 0.8)

Figure 14.3: Plotting the first NMDS axis against altitude.

The scores of the first NMDS axis represent the different vegetation formations, i.e., the floristic gradient, appearing along the slope of Mt. Mongón.To spatially visualize them, we can model the NMDS scores with the previously created predictors (Section 14.2), and use the resulting model for predictive mapping (see next section).