Atmospheric air pollution is one of the most important environmental concerns in many countries around the world; it is strongly affected by meteorological conditions. In this set of exercises, we will use the
openair
package to work and analyze air quality and meteorological data. This package provides tools to directly import data from air quality measurement networks across the UK, as well as tools to analyze and produce reports.
In the previous exercise sets, we used some functions in openair
package along with some geospatial packages to spatially analyze and visualize air quality data. In this exercise set we will use some tools in openair
to statistically compare measured air quality data with those obtained from modeling. This would be very important to evaluate the model performance in temproally ans spatially prediction of air pollutant concentration.
Answers to the exercises are available here.
For other parts of this exercise set, follow the tag openair.
For this exercise set for simplicity, we will first produce synthetic observation and modeled data using following commands:
set.seed(10)
obs <- 100 * runif(100)
mod1 <- data.frame(obs, mod = obs + 10, model = "model 1")
mod2 <- data.frame(obs, mod = obs + 20 * rnorm(100), model = "model 2")
mod3 <- data.frame(obs, mod = obs - 10 * rnorm(100), model = "model 3")
mod4 <- data.frame(obs, mod = obs / 2 + 10 * rnorm(100), model = "model 4")
mod5 <- data.frame(obs, mod = obs * 1.5 + 3 * rnorm(100), model = "model 5")
modData <- rbind(mod1, mod2, mod3, mod4, mod5)
Exercise 1
The modStats
function can be used to statistically evaluate and compare model results against observation. It gives wide range of statistics such as mean bias, mean errro, FAC2, normalised mean bias and error, correlation coefficient, and index of agreement.
Use modStats
to statistically compare the produced model data against those from observation data.
Exercise 2
It is also possible to print the rank number of each model ased on the Coefficient of Efficiency, which is a good indicator of model performance.
Use modStats
to identify the best model in terms of performance.
Exercise 3
Another useful tool that can be used to statistically compare modeling results with observation or with each other is Taylor diagrams. This diagram show three statistics as well as their relations at the same time. These statistics are:the correlation coefficient (R), the standard
deviation (sigma) and the centred root-mean-square error. TaylorDiagram
function
can be used to plot such diagrams.
Use TaylorDiagram
to plot taylor digrams and compare the modeling against those from observation data for the data.frame used in the previous exercises.
[Intermediate] Spatial Data Analysis with R, QGIS & More. this course you will learn how to:
- Work with Spatial data and maps
- Learn about different tools to develop spatial data next to R
- And much more
Exercise 4
conditionalQuantile
is another useful function to evaluate modeling results. The conditional quantile plot splits the data into evenly spaced bins. For each predicted value bin the corresponding values of the observations are calculated and the median, 25/75th and 10/90 percentiles or quantiles are calculated for that bin. The data are plotted to show how these values vary across all bins. For a time series of observations and predictions that agree precisely the median value of the predictions will equal that for the observations for each bin.
For this exercise we first need to load sample real modeling and observation data:
load(url("http://www.erg.kcl.ac.uk/downloads/Policy_Reports/AQdata/condDat.RData"))
Now, use conditionalQuantile
to evaluate the modeled O3 concentration against those from observation.
Exercise 5
It is more interesting to see how the model performance varies in each season. Accordingly use conditionalQuantile
to evaluate the modeled O3 concentration against those from observation in each season.
Leave a Reply