Background
In supervised classification, contrary to the unsupervised version, a priori defined reference class is used as additional information. This initial process determines which classes are the result of the classification. Usually, a statistical or machine-learning algorithm is used to obtain or “learn” a classification function from a set of training examples. This is then used to map every instance (pixel or object, depending on the approach used) to its corresponding class. The following workflow is commonly used for deploying a supervised classifier:
- Definition of the thematic classes of land cover/use (ex. coniferous forest, deciduous forest, water, agriculture, urban)
- Classification of suitable training areas (reference areas/pixels for each class)
- Calibration of a classification algorithm (for the training set)
- Classification of the entire image (outside the ‘training space’)
- Classify performance evaluation, verification, and inspection of the results (for the testing set)
Usually, in supervised classification, spectral data from each of the sensor bands are used to obtain a statistical or rule-based spectral signature for each class. Besides “direct” spectral data, other kinds of information or features can be used for classifier training. These include band ratios, spectral indices, texture features, temporal variation features (ex. green-up and senescence changes), as well as ancillary data (ex. elevation, slope, built-up masks, roads.)
Combining the training data and the spectral (or other) features in a classifier algorithm allows you to classify the entire image outside the training space. Usually, a form of train/test split set strategy (holdout cross-validation, k-fold CV, etc.) is used. The training set is used for classifier calibration, while the testing set is used for evaluating the classification performance. This process is usually repeated a few times and then an average value of validation indices is calculated.
Because R currently provides a very large set of classification algorithms (a good package to access them is caret
), it is particularly well-equipped to handle this kind of problem. For developing the examples, the Random Forest (RF) algorithm will be used. RF is implemented in the (conveniently named 😉) randomForest
package. Although packages such as caret
provide many useful functions to handle classification (training, tuning and evaluation processes), I will not use it them here. My objective in this post is to explore and show the basic and “under the wood” workflow in pixel-based classification of raster data.
In a nutshell, RF is an ensemble learning method for classification, regression, and other tasks that operate by constructing multiple decision trees during the training stage (‘bagging’) and outputting the class that is the mode of the classes (classification) or the average prediction (regression) of the individual trees. This way, RF corrects for decision trees’ habit of over-fitting to their training set. See more here and here.
Sample data from the optical Sentinel-2a (S2) satellite platform will be used in the examples below (see here for more details.) This data was made available in the 2017 IEEE GRSS Data Fusion Contest and provided by the GRSS Data and Algorithm Standard Evaluation (DASE) website (you have to register to access the sample data-sets currently available.) More specifically, we will use one Sentinel-2 scene for Berlin containing 10 spectral bands, originally at 10m and 20m of spatial resolution, but they’re re-sampled to 100m in DASE.
Along with S2 spectral data, DASE also provides training samples for calibrating classifiers. The legend encompasses a total of 12 land cover/use classes that are presented in the table below (NOTE: only 12 out of the 17 classes actually appear in the Berlin area.)
legBerlin <- read.csv(url("https://raw.githubusercontent.com/joaofgoncalves/R_exercises_raster_tutorial/master/data/legend_berlin.csv"))
knitr::kable(legBerlin)
Type | Code | Land.cover.class.description |
---|---|---|
Artificial | 1 | Compact high-rise |
Artificial | 2 | Compact midrise |
Artificial | 3 | Compact low-rise |
Artificial | 4 | Open high-rise |
Artificial | 5 | Open midrise |
Artificial | 6 | Open low-rise |
Artificial | 7 | Lightweight low-rise |
Artificial | 8 | Large low-rise |
Artificial | 9 | Sparsely built |
Artificial | 10 | Heavy industry |
Vegetated | 11 | Dense trees |
Vegetated | 12 | Scattered trees |
Vegetated | 13 | Bush and scrub |
Vegetated | 14 | Low plants |
Vegetated | 15 | Bare rock or paved |
Vegetated | 16 | Bare soil or sand |
Vegetated | 17 | Water |
For more info on raster data processing, check out the complete tutorial series here.
Data Loading and Visualization
- Work with k-means alghoritms and different types of clustering techniques,
- Know how to compare alghoritms and choose the right fit,
- And much more
Now that we have defined some useful concepts, the workflow, and the data, we can start coding! 👍 👍 The first step is to download and uncompress the spectral data for Sentinel-2. These will later be used as training input features for the classification algorithm to identify the spectral signatures for each class.
## Create a folder named data-raw inside the working directory to place downloaded data
if(!dir.exists("./data-raw")) dir.create("./data-raw")
## If you run into download problems try changing: method = "wget"
download.file("https://raw.githubusercontent.com/joaofgoncalves/R_exercises_raster_tutorial/master/data/berlin.zip", "./data-raw/berlin.zip", method = "auto")
## Uncompress the zip file
unzip("./data-raw/berlin.zip", exdir = "./data-raw")
Load the required packages for the post:
library(raster)
library(randomForest)
Now that we have the data files available, let’s create a RasterStack
object from it. We will also change layer names for more convenient ones.
fl <- list.files("./data-raw/berlin/S2", pattern = ".tif$", full.names = TRUE)
rst <- stack(fl)
names(rst) <- c(paste("b",2:8,sep=""),"b8a","b11","b12")
Now, let’s use the plotRGB
function to visually explore the spectral data from Sentinel-2. RGB composites made from different band combinations allow us to highlight different aspects of land cover and see different layers of the Earth’s surface. Note: band numbers in the Sentinel-2 satellite differ from its position (integer index) in the RasterStack
object.
Let’s start by making a Natural Color RGB composite from S2 bands: 4,3, and 2.
plotRGB(rst, r=3, g=2, b=1, scale=1E5, stretch="lin")

Next, let’s see a healthy vegetation composite from S2 bands: 8,11, and 2.
plotRGB(rst, r=7, g=9, b=1, scale=1E5, stretch="lin")

Finally, a false color urban using S2 bands: 12,11, and 4.
plotRGB(rst, r=10, g=9, b=3, scale=1E5, stretch="lin")

Now, let’s load the training samples used in calibration. This data serves as a reference or example from which the classifier algorithm will “learn” the spectral signatures of each class.
rstTrain <- raster("./data-raw/berlin/train/berlin_lcz_GT.tif")
# Remove zeros from the train set (background NA)
rstTrain[rstTrain==0] <- NA
# Convert the data to factor/discrete representation
rstTrain <- ratify(rstTrain)
# Change the layer name
names(rstTrain) <- "trainClass"
# Visualize the data
plot(rstTrain, main="Train areas by class for Berlin")

Let’s see how many training pixels we have for each of the 12 classes (\(N_{total} = 24537\)):
tab <- table(values(rstTrain))
print(tab)
##
## 2 4 5 6 8 9 11 12 13 14 16 17
## 1534 577 2448 4010 1654 761 4960 1028 1050 4424 359 1732
Although, perhaps not the best approach in some cases, we will convert our raster data-set into a data.frame
object so we can use the RF classifier. Take into consideration, that in some cases, depending on the size of your RasterStack
and the available memory, using this approach will not be possible. One simple way to overcome this would be to convert the training raster into a SpatialPoints
object and then run the function extract
. This way, only specific pixels from the stack are retrieved. In any case, let’s proceed to get pixel values into our calibration data frame:
rstDF <- na.omit(values(stack(rstTrain, rst)))
rstDF[,"trainClass"] <- as.factor(as.character(rstDF[,"trainClass"]))
As you probably noticed from the code above, NA
’s were removed and the reference class column was converted to a categorical/factor variable. In practice, by “removing NA
’s”, it means that we are restricting the data only to the set of training pixels in rstTrain
(reducing from 428,238 rows to 24,537 rows 👍.)
Next up, setting some parameters. In the example, we will use holdout cross-validation (HOCV) to evaluate the RF classifier performance. This means that we will use an iterative split set approach with a training and a testing set. So, for this purpose, we need to define the proportion of instances for training (pTrain
); the remaining will be set aside for evaluation. Here I took into consideration the fact that RF tends to take some time to calibrate with large numbers of observations, (~ >10000) hence the relatively ‘large’ train proportion. We also need to define the number of repetitions in HOCV (nEvalRounds
.)
# Number of holdout evaluation rounds
nEvalRounds <- 20
# Proportion of the data used for training the classifier
pTrain <- 0.5
Now, let’s initialize some objects that will allow us to store some info on the the classification performance and validation:
n <- nrow(rstDF)
nClasses <- length(unique(rstDF[,"trainClass"]))
# Initialize objects
confMats <- array(NA, dim = c(nClasses,nClasses,nEvalRounds))
evalMatrix<-matrix(NA, nrow=nEvalRounds, ncol=3,
dimnames=list(paste("R_",1:nEvalRounds,sep=""),
c("Accuracy","Kappa","PSS")))
pb <- txtProgressBar(1, nEvalRounds, style = 3)
Now, with all set, let’s calibrate and evaluate our RF classifier (use comments to guide you through the code):
# Evaluation function
source(url("https://raw.githubusercontent.com/joaofgoncalves/Evaluation/master/eval.R"))
# Run the classifier
for(i in 1:nEvalRounds){
# Create the random index for row selection at each round
sampIdx <- sample(1:n, size = round(n*pTrain))
# Calibrate the RF classifier
rf <- randomForest(y = rstDF[sampIdx, "trainClass"],
x = rstDF[sampIdx, -1],
ntree = 200)
# Predict the class to the test set
testSetPred <- predict(rf, newdata = rstDF[-sampIdx,], type = "response")
# Get the observed class vector
testSetObs <- rstDF[-sampIdx,"trainClass"]
# Evaluate
evalData <- Evaluate(testSetObs, testSetPred)
evalMatrix[i,] <- c(evalData$Metrics["Accuracy",1],
evalData$Metrics["Kappa",1],
evalData$Metrics["PSS",1])
# Store the confusion matrices by eval round
confMats[,,i] <- evalData$ConfusionMatrix
# Classify the whole image with raster::predict function
rstPredClassTMP <- predict(rst, model = rf,
factors = levels(rstDF[,"trainClass"]))
if(i==1){
# Initiate the predicted raster
rstPredClass <- rstPredClassTMP
# Get precision and recall for each class
Precision <- evalData$Metrics["Precision",,drop=FALSE]
Recall <- evalData$Metrics["Recall",,drop=FALSE]
}else{
# Stack the predicted rasters
rstPredClass <- stack(rstPredClass, rstPredClassTMP)
# Get precision and recall for each class
Precision <- rbind(Precision,evalData$Metrics["Precision",,drop=FALSE])
Recall <- rbind(Recall,evalData$Metrics["Recall",,drop=FALSE])
}
setTxtProgressBar(pb,i)
}
# save.image(file = "./data-raw/P8-session.RData")
Three classification evaluation measures will be used: (i) overall accuracy, (ii) Kappa, and (iii) Peirce-skill score (PSS; aka true-skill statistic.) Let’s print out the results by round:
knitr::kable(evalMatrix, digits = 3)
Accuracy | Kappa | PSS | |
---|---|---|---|
R_1 | 0.788 | 0.755 | 0.749 |
R_2 | 0.789 | 0.756 | 0.751 |
R_3 | 0.782 | 0.748 | 0.741 |
R_4 | 0.795 | 0.763 | 0.757 |
R_5 | 0.789 | 0.755 | 0.749 |
R_6 | 0.789 | 0.755 | 0.750 |
R_7 | 0.784 | 0.751 | 0.744 |
R_8 | 0.785 | 0.751 | 0.745 |
R_9 | 0.785 | 0.751 | 0.745 |
R_10 | 0.789 | 0.756 | 0.750 |
R_11 | 0.785 | 0.752 | 0.746 |
R_12 | 0.786 | 0.752 | 0.746 |
R_13 | 0.786 | 0.753 | 0.747 |
R_14 | 0.785 | 0.751 | 0.745 |
R_15 | 0.794 | 0.762 | 0.756 |
R_16 | 0.785 | 0.752 | 0.745 |
R_17 | 0.791 | 0.759 | 0.753 |
R_18 | 0.782 | 0.748 | 0.741 |
R_19 | 0.781 | 0.746 | 0.739 |
R_20 | 0.785 | 0.751 | 0.744 |
Next, calculate some average and sd measures across all rounds:
round(apply(evalMatrix,2,FUN = function(x,...) c(mean(x,...), sd(x,...))), 3)
## Accuracy Kappa PSS
## [1,] 0.787 0.753 0.747
## [2,] 0.004 0.004 0.005
Overall measures seem to indicate that results are acceptable with very low variation between calibration rounds. Let’s check out some average precision (aka positive predictive value, PPV), recall (aka true positive rate, TPR) and F1 measures by class:
avgPrecision <- apply(Precision,2,mean)
print(round(avgPrecision, 3))
## 11 12 13 14 16 17 2 4 5 6 8 9
## 0.963 0.705 0.717 0.906 0.789 0.998 0.657 0.345 0.509 0.698 0.664 0.484
avgRecall <- apply(Recall,2,mean)
print(round(avgRecall, 3))
## 11 12 13 14 16 17 2 4 5 6 8 9
## 0.979 0.685 0.648 0.960 0.434 1.000 0.611 0.093 0.481 0.869 0.647 0.268
avgF1 <- (2 * avgPrecision * avgRecall) / (avgPrecision+avgRecall)
print(round(avgF1, 3))
## 11 12 13 14 16 17 2 4 5 6 8 9
## 0.971 0.695 0.681 0.932 0.560 0.999 0.633 0.147 0.495 0.774 0.655 0.345
Well, things are not so great here… 🙍♂️ Some classes, such as 4, 9, 5 (different artificial/urban types) and 16 (bare soil/sand) have relatively low precision/recall/F1 values. This may be a consequence of the loss of information detail due to the 100m re-sampling or some class intermixing, and/or even train data generalization… 🤔 … this subject definitely requires more investigation. 😉
Now, let’s check the confusion matrix for the best round:
# Best round for Kappa
evalMatrix[which.max(evalMatrix[,"Kappa"]), , drop=FALSE]
## Accuracy Kappa PSS
## R_4 0.7946858 0.7626567 0.7569426
# Show confusion matrix for the best kappa
cm <- as.data.frame(confMats[,,which.max(evalMatrix[,"Kappa"])])
# Change row/col names
colnames(cm) <- rownames(cm) <- paste("c",levels(rstDF[,"trainClass"]),sep="_")
knitr::kable(cm)
c_11 | c_12 | c_13 | c_14 | c_16 | c_17 | c_2 | c_4 | c_5 | c_6 | c_8 | c_9 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
c_11 | 2442 | 30 | 2 | 2 | 0 | 0 | 0 | 1 | 1 | 8 | 0 | 3 |
c_12 | 48 | 330 | 57 | 23 | 0 | 0 | 0 | 1 | 5 | 24 | 1 | 7 |
c_13 | 7 | 43 | 358 | 97 | 1 | 1 | 0 | 0 | 2 | 6 | 4 | 3 |
c_14 | 0 | 20 | 48 | 2096 | 1 | 0 | 0 | 1 | 1 | 11 | 0 | 5 |
c_16 | 0 | 3 | 4 | 29 | 80 | 0 | 0 | 0 | 4 | 4 | 51 | 5 |
c_17 | 0 | 0 | 0 | 0 | 0 | 914 | 0 | 0 | 0 | 0 | 0 | 0 |
c_2 | 0 | 0 | 0 | 4 | 1 | 0 | 507 | 6 | 199 | 23 | 73 | 1 |
c_4 | 2 | 4 | 6 | 5 | 2 | 0 | 2 | 31 | 114 | 105 | 16 | 9 |
c_5 | 5 | 3 | 3 | 10 | 1 | 0 | 171 | 29 | 619 | 262 | 87 | 14 |
c_6 | 14 | 19 | 5 | 4 | 0 | 0 | 1 | 6 | 142 | 1753 | 13 | 38 |
c_8 | 2 | 7 | 6 | 15 | 7 | 3 | 78 | 7 | 136 | 35 | 516 | 13 |
c_9 | 3 | 9 | 10 | 17 | 1 | 0 | 0 | 2 | 10 | 190 | 5 | 104 |
Since we obtained one classified map for each round, we can pull all that information by ensembling it together through a majority vote (ex. calculating the model class.) The raster
package modal
function makes it really easy to calculate this:
rstModalClass <- modal(rstPredClass)
rstModalClassFreq <- modal(rstPredClass, freq=TRUE)
medFreq <- zonal(rstModalClassFreq, rstTrain, fun=median)
Using the model frequency of the 20 classification rounds, let’s check out which classes obtained the highest ‘uncertainty’:
colnames(medFreq) <- c("ClassCode","MedianModalFrequency")
medFreq[order(medFreq[,2],decreasing = TRUE),]
## ClassCode MedianModalFrequency
## [1,] 6 20
## [2,] 11 20
## [3,] 12 20
## [4,] 14 20
## [5,] 17 20
## [6,] 2 19
## [7,] 8 19
## [8,] 13 19
## [9,] 5 15
## [10,] 16 14
## [11,] 9 13
## [12,] 4 11
These results somewhat confirm those of the class-wise precision/recall. Classes most often shifting (lower frequency) are those with lower values for these performance measures.
Finally, let’s plot our results of the final modal classification map (and modal frequency):
par(mfrow=c(1,2), cex.main=0.8, cex.axis=0.8)
plot(rstModalClass, main = "RF modal land cover class")
plot(rstModalClassFreq, main = "Modal frequency")

The map on the right provides some insight to identify areas that are more problematic for the classification process. However, as you can see, the results are acceptable but possible to improve in many ways.
See more stuff about geospatial and raster data processing here.
This concludes our exploration of the raster package and supervised classification for this post. Hope you find it useful! 😄 👍 👍
Thank you João for this tutorial. I’m having an error running the classifier:
Error in confMats[, , i] <- evalData$ConfusionMatrix :
number of items to replace is not a multiple of replacement length
If I ignore or comment that line, the following error appears:
Error in rbiFnd(Precision, evalData$Metrics["Precision", , drop = FALSE]) :
number of columns of matrices must match (see arg 2)
In addition: Warning message:
In UseMethod("unique") :
closing unused connection 3 (https://raw.githubusercontent.com/joaofgoncalves/Evaluation/master/eval.R)
I have checked the script and it's identical to the example. Any idea what could be wrong?