Atmospheric air pollution is one of the most important environmental concerns in many countries around the world; it is strongly affected by meteorological conditions. In this set of exercises, we will use the
openair package to work and analyze air quality and meteorological data. This package provides tools to directly import data from air quality measurement networks across the UK, as well as tools to analyze and produce reports.
In the previous exercise sets, we used some functions in
openair package along with some geospatial packages to spatially analyze and visualize air quality data. In this exercise set we will use some tools in
openair to statistically compare measured air quality data with those obtained from modeling. This would be very important to evaluate the model performance in temproally ans spatially prediction of air pollutant concentration.
Answers to the exercises are available here.
For other parts of this exercise set, follow the tag openair.
For this exercise set for simplicity, we will first produce synthetic observation and modeled data using following commands:
obs <- 100 * runif(100)
mod1 <- data.frame(obs, mod = obs + 10, model = "model 1")
mod2 <- data.frame(obs, mod = obs + 20 * rnorm(100), model = "model 2")
mod3 <- data.frame(obs, mod = obs - 10 * rnorm(100), model = "model 3")
mod4 <- data.frame(obs, mod = obs / 2 + 10 * rnorm(100), model = "model 4")
mod5 <- data.frame(obs, mod = obs * 1.5 + 3 * rnorm(100), model = "model 5")
modData <- rbind(mod1, mod2, mod3, mod4, mod5)
modStats function can be used to statistically evaluate and compare model results against observation. It gives wide range of statistics such as mean bias, mean errro, FAC2, normalised mean bias and error, correlation coefficient, and index of agreement.
modStats to statistically compare the produced model data against those from observation data.
It is also possible to print the rank number of each model ased on the Coefficient of Efficiency, which is a good indicator of model performance.
modStats to identify the best model in terms of performance.
Another useful tool that can be used to statistically compare modeling results with observation or with each other is Taylor diagrams. This diagram show three statistics as well as their relations at the same time. These statistics are:the correlation coefficient (R), the standard
deviation (sigma) and the centred root-mean-square error.
can be used to plot such diagrams.
TaylorDiagram to plot taylor digrams and compare the modeling against those from observation data for the data.frame used in the previous exercises.
conditionalQuantile is another useful function to evaluate modeling results. The conditional quantile plot splits the data into evenly spaced bins. For each predicted value bin the corresponding values of the observations are calculated and the median, 25/75th and 10/90 percentiles or quantiles are calculated for that bin. The data are plotted to show how these values vary across all bins. For a time series of observations and predictions that agree precisely the median value of the predictions will equal that for the observations for each bin.
For this exercise we first need to load sample real modeling and observation data:
conditionalQuantile to evaluate the modeled O3 concentration against those from observation.
It is more interesting to see how the model performance varies in each season. Accordingly use
conditionalQuantile to evaluate the modeled O3 concentration against those from observation in each season.