Geospatial data is becoming increasingly used to solve numerous ‘real-life’ problems (check out some examples here.) In turn, R is becoming a powerful open-source solution to handle this type of data, currently providing an exceptional range of functions and tools for GIS and Remote Sensing data analysis.
In particular, raster data provides support for representing spatial phenomena by diving the surface into a grid (or matrix) composed of cells of regular size. Each raster data-set has a certain number of columns and rows and each cell contains a value with information for the variable of interest. Stored data can be either: (i) thematic – representing a discrete variable, (ex. land cover classification map) or continuous (ex. elevation).
raster package currently provides an extensive set of functions to create, read, export, manipulate and process raster data-sets. It also provides low-level functionalities for creating more advanced processing chains, as well as the ability to manage large data-sets. For more information, see:
vignette("functions", package = "raster"). You can also check more about raster data on the tutorial series about this topic here.
In this exercise set, we will explore the following topics in raster data processing and geostatistical analysis (previously discussed in this tutorial series):
- Unsupervised classification/clustering of satellite data
- Regression-kriging (RK)
We will also address how to use the package
RSToolbox (link) to calculate the:
- Tasseled Cap Transformation (TCT)
- PCA rotation/transformation
Both data compression techniques examined here will use spectral data from satellite imagery.
Answers to these exercises are available here.
Use the data in this link (Landsat-8 surface reflectance data bands 1-7, for Peneda-Geres National Park – PGNP, NW Portugal) to answer the next exercises (1 to 6). Download the data, uncompress and create a raster brick. How many pixels and layers does the data have?
Make an RGB plot with bands 5, 1, and 3 with linear stretching.
Using k-means algorithm performs an unsupervised classification/clustering of the data with 5 clusters.
Use the CLARA algorithm (package
cluster) to perform an unsupervised classification/clustering of the data with 5 clusters and Euclidean distance.
RStoolbox, calculate the Tasseled Cap Transformation of the data (remember it is Landsat-8 data with bands 1-7).
RStoolbox, calculate the standardized PCA transform. What is the cumulative % of explained variance in the three first components?
- Use the data in this link to answer the next exercises (annual average temperature for weather stations in Portugal; col
AvgTemp). Using Lat and Lon columns from the
clim_data_pt.csvtable, create a
SpatialPointsDataFrameobject with CRS WGS 1984.
- Using Ordinary Kriging from package
gstat, interpolate temperature values employing a Spherical empirical variogram. Calculate the RMSE from 5-fold cross-validation (see function
krige.cv) and use the
Using the previous question rationale, experiment now with an Exponential model. Calculate RMSE also from 5-fold CV. Which one was the best model according to RMSE?
Using the cubist regression algorithm (package
Cubist), predict the based
AvgTempon latitude (
Lat), elevation (column
Elev) and distance to the coastline (column
distCoast). Calculate the RMSE for a random test set of 15 observations. Use the
From the previous exercise, extract the train residuals and interpolate them. Following a Regression-kriging approach, add the interpolated residuals and the regression results. Calculate the RMSE for the test set (defined in E9) and check if this improves the modeling performance any further.