Geospatial data is becoming increasingly used to solve numerous ‘real-life’ problems (check out some examples here.) In turn, R is becoming a powerful open-source solution to handle this type of data, currently providing an exceptional range of functions and tools for GIS and Remote Sensing data analysis.

In particular, **raster data** provides support for representing spatial phenomena by diving the surface into a grid (or matrix) composed of cells of regular size. Each raster data-set has a certain number of columns and rows and each cell contains a value with information for the variable of interest. Stored data can be either: (i) thematic – representing a **discrete** variable, (ex. land cover classification map) or **continuous** (ex. elevation).

The `raster`

package currently provides an extensive set of functions to create, read, export, manipulate and process raster data-sets. It also provides low-level functionalities for creating more advanced processing chains, as well as the ability to manage large data-sets. For more information, see: `vignette("functions", package = "raster")`

. You can also check more about raster data on the tutorial series about this topic here.

In this exercise set, we will explore the following topics in raster data processing and geostatistical analysis (previously discussed in this tutorial series):

- Unsupervised classification/clustering of satellite data
- Regression-kriging (RK)

We will also address how to use the package `RSToolbox`

(link) to calculate the:

- Tasseled Cap Transformation (TCT)
- PCA rotation/transformation

Both data compression techniques examined here will use spectral data from satellite imagery.

Answers to these exercises are available here.

**Exercise 1**

Use the data in this link (Landsat-8 surface reflectance data bands 1-7, for Peneda-Geres National Park – PGNP, NW Portugal) to answer the next exercises (1 to 6). Download the data, uncompress and create a raster brick. How many pixels and layers does the data have?

**Exercise 2**

Make an RGB plot with bands 5, 1, and 3 with linear stretching.

**Exercise 3**

Using k-means algorithm performs an unsupervised classification/clustering of the data with 5 clusters.

**Exercise 4**

Use the CLARA algorithm (package `cluster`

) to perform an unsupervised classification/clustering of the data with 5 clusters and Euclidean distance.

**Exercise 5**

Using package `RStoolbox`

, calculate the Tasseled Cap Transformation of the data (remember it is Landsat-8 data with bands 1-7).

**Exercise 6**

Using package `RStoolbox`

, calculate the standardized PCA transform. What is the cumulative % of explained variance in the three first components?

**Exercise 7**

- Use the data in this link to answer the next exercises (annual average temperature for weather stations in Portugal; col
`AvgTemp`

). Using Lat and Lon columns from the`clim_data_pt.csv`

table, create a`SpatialPointsDataFrame`

object with CRS WGS 1984. - Using Ordinary Kriging from package
`gstat`

, interpolate temperature values employing a*Spherical*empirical variogram. Calculate the RMSE from 5-fold cross-validation (see function`krige.cv`

) and use the`set.seed(12345)`

.

**Exercise 8**

Using the previous question rationale, experiment now with an *Exponential* model. Calculate RMSE also from 5-fold CV. Which one was the best model according to RMSE?

**Exercise 9**

Using the cubist regression algorithm (package `Cubist`

), predict the based `AvgTemp`

on latitude (`Lat`

), elevation (column `Elev`

) and distance to the coastline (column `distCoast`

). Calculate the RMSE for a random test set of 15 observations. Use the `set.seed(12345)`

.

**Exercise 10**

From the previous exercise, extract the train residuals and interpolate them. Following a Regression-kriging approach, add the interpolated residuals and the regression results. Calculate the RMSE for the test set (defined in E9) and check if this improves the modeling performance any further.

## Leave a Reply