- Forecasting: Exponential Smoothing Exercises (Part-3)
- Forecasting for small business Exercises (Part-3)
- Parallel Computing Exercises: Foreach and DoParallel (Part-2)
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

SEASONALITY AND TREND DECOMPOSITION

If you don’t have the data, please first get it from the first tutorial here. Also, you need to install and load the `ggplot2`

package.

if(!require(ggplot2)){install.packages(ggplot2, dep=T)}

Answers to these exercises are available here.

Time series decomposition is a mathematical procedure which transforms a time series into multiple different time series’, often is split into 3 components:

Seasonal: Patterns that repeat with a fixed period of time. For example, in the river, we can see that every year the river has a high level regimen in the middle of the year and low levels in the beginning and end of the year. That is a yearly seasonality.

Trend: The underlying trend of the series. If the river level has increased or decreased constantly through the years.

Random: Also called “noise”, this is the residuals of the original time series after the seasonal and trend series are removed.

**Exercise 1**

A centered moving average can be used to smooth the time series and detect the underlying trend. To perform the decomposition, it is vital to use a moving window of the exact size of the seasonality. In this case, we know from the last tutorial that the time series has a yearly seasonality.

Please create a time series object of the `LEVEL`

values with the function `ts`

. Hint: frequency = 365.

**Exercise 2**

The decomposition can be calculated it in two different ways: an additive decomposition, when the values increase as a trend and the seasonality stays relatively constant. In a multiplicative decomposition, the amplitude of the seasonality also increases along with the trend. Please run an additive decomposition for the `LEVEL`

values with the function ` decompose`

and plot it.

**Exercise 3**

Please run an additive decomposition for the `LEVEL`

values with the function ` decompose`

and plot it.

**Exercise 4**

Use the additive formula Time series = Seasonal + Trend + Random to reconstruct the series and plot it.

**Exercise 5**

Use the multiplicative formula Time series = Seasonal * Trend * Random to reconstruct the series and plot it.

**Exercise 6**

As you can see, the trend is the same in the two decomposition’s, but the seasonal and random components are different. Please calculate and plot the difference between multiplicative and additive models.

**Exercise 7**

In order to compare each other, we can estimate the power spectra and convert it in to a probability distribution. The trick is very easy; you just have to normalize it. Please write a function for it.

**Exercise 8**

Now, please use your function to calculate the periodogram for the seasonal and random components. Hint: remember the first deal with the NA values.

**Exercise 9**

Create an overlapped plot of the seasonal and random spectrum’s. Hint: you will see that the only different component is the random.

- R FOR HYDROLOGISTS – Part 1: Correlation and Information Theory Measurements
- Parallel Computing Exercises: Foreach and DoParallel (Part-2)
- R FOR HYDROLOGISTS: Correlation and Information Theory Measurements – Part 2: Exercises
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

**R FOR HYDROLOGISTS **

CORRELATION AND INFORMATION THEORY MEASUREMENTS – PART 3

Before we begin, if you don’t have the data, first get it from the first tutorial here. You will also need to Install and load the `ggplot2`

and `reshape2`

packages.

if(!require(ggplot2)){install.packages(ggplot2, dep=T)}

if(!require(reshape2)){install.packages(reshape2, dep=T)}

Answers to these exercises are available here.

The mutual information quantifies the “amount of information” between the two variables in bits. To transform it into a metric, there has been several variants proposed of the MI; one of those is a normalization that assumes MI as an analog of co-variance and calculated it as a Pearson correlation co-efficient `NMI=MI/(Hx+Hy)^(1/2)`

.

**Exercise 1**

Please write a function to calculate the normalized mutual information with two input parameters `x,y`

as vectors and NMI as the return value. Hint: Reuse the code of the last tutorial .

**Exercise 2**

Similar to before, we will estimate the linear auto-correlation function. Also, it is possible to estimate a nonlinear auto-correlation function using the NMI as a correlation co-efficient of the lags of the time series. Please load the function `createLags(x, numberOfLags, VarName)`

and create the embedded space for the first 400 lags of the `LEVEL`

and the `RAIN`

.

**Exercise 3**

To calculate the nonlinear auto-correlation function (NACF), you can estimate the NMI for the first column of `lags_level`

compared with all the other lags. Do it also for the `lags_rain`

.

**Exercise 4**

To calculate the nonlinear cross correlation function (NCCF), you can estimate the NMI for the first column of `lags_level`

compared with all the lags of `lags_rain`

. Do it also for the `lags_rain`

compared with all the lags of ` lags_level `

.

**Exercise 5**

Another very useful tool of measurement is the the Kullback–Leibler divergence or relative entropy. It measures how one’s probability distribution `q`

diverges from a second expected probability distribution `p`

. It is possible to estimate it using the formula: cross entropy of `q`

respect to `p`

minus the entropy of `p`

.

To estimate the probability distribution `p`

and `q`

, this time we will change our approach and we will use a `geom_histogram`

. Please create a histogram of 10 bins from the level and a histogram from the first lag, then assign it to `p`

and `q`

.

Hint: 1) Remember to always use the interval from p for the histograms. 2) After grabbing the first layer of data from the plot with ` layer_data `

, you can get use from the column `$count `

.

**Exercise 6**

Now, please calculate the entropy of `p`

.

**Exercise 7**

Now calculate the cross entropy ` Hp_q`

with the formula `-sum(p*log2(q))`

. Hint: Remember to avoid negative values of q.

**Exercise 8**

Finally, please calculate and print the Kullback–Leibler divergence.

- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

**R FOR HYDROLOGISTS **

CORRELATION AND INFORMATION THEORY MEASUREMENTS (Part 2)

Proposed back in the 40’s by Shannon Information theory provide a framework for the analysis of randomness in time-series, and information gain when comparing statistical models of inference. Information theory is based on probability theory and statistics. It often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are Entropy, a measure of information in a single random variable, Mutual information, a measure of information in common between two random variables, and Relative entropy that measure how one probability distribution diverges from a second expected probability distribution.

In this tutorial we will estimate these measurements in order to characterize the river dynamic. If you don’t have the data please first see the first part of the tutorial here and Install and load `ggplot2`

and `reshape2`

packages

if(!require(ggplot2)){install.packages(ggplot2, dep=T)}

if(!require(reshape2)){install.packages(reshape2, dep=T)}

Answers to the exercises are available here.

All information measurements are derivate from the join and marginal distributions of two variables. To estimate this empiric distribution we will use histograms; in this opportunity we will `geom_bin2d`

. Let’s do it step by step.

**Exercise 1**

First please create a `geom_point`

plot of the `LEVEL`

against the ` RAIN`

**Exercise 2**

Now please overlap a 2D histogram with the function `geom_bin2d()`

**Exercise 3**

We have to get the joint probability matrix. So please set the number of `bins =10`

and plot the joint probability distribution of the `LEVEL`

and the ` RAIN `

then assign it to an object `p`

.

**Exercise 4**

Extract from the object p the data of the first layer with the function `layer_data`

and assign it to `pxy_m`

.

**Exercise 5**

As you can see `ggplot`

return a column based data frame with the `x`

, `y`

and the value of the density index as columns. Please convert it to a rectangular matrix with the function `acast`

and sign it to `pxy`

**Exercise 6**

Please guarantee the natural restriction the probability distribution `sum(pxy)==1`

**Exercise 7**

Estimate the marginal probabilities `px`

and `py`

.

**Exercise 8**

Great now we have everything we need. Please estimate the entropy in bits (log2) for each variable `Hx`

and `Hy`

.

**Exercise 9**

Estimate the Joint entropy in bits (log2) with the formula: `Hxy=-sum(pxy*log2(pxy))`

. Remember that in order to avoid numerical error you have to use just positives probabilities pxy>0 before applying the formula

**Exercise 10**

Last step, please calculate the mutual information Hint: `MI=Hx+Hy-Hxy`

- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

**R FOR HYDROLOGISTS **

CORRELATION AND INFORMATION THEORY MEASUREMENTS (Part 1)

In this tutorial, we will show you how to apply tools, such as the correlation, auto-correlation, entropy, and mutual information as an introductory exercise in the analysis of time series dynamics. The first measurement that we will calculate will be the linear correlation. This statistic quantifies the linear correlation between two variables and represents how much these two data sets resemble a straight line. If there is a positive relationship between the variables, the value of ρ approaches 1. If there is a negative relation, ρ approaches -1. Finally, if there is no relation, ρ approaches 0.

If you don’t have the data, please first see the first part of the tutorial here. Install and load the `ggplot2`

, `GGally `

and ` forecast `

packages.

if(!require(ggplot2)){install.packages(ggplot2, dep=T)}

if(!require(GGally)){install.packages(GGally, dep=T)}

if(!require(forecast)){install.packages(forecast, dep=T)}

Answers to these exercises are available here.

**Exercise 1**

Please calculate the correlation coefficient between the `LEVEL`

and the ` RAIN `

with the function ` cor`

.

**Exercise 2**

One nice way to resume information is through `ggpairs`

. This plot contains a scatter plot of x[,i] plotted against x[,j] a matrix of data. In the diagonal, a plotted kernel mass estimation and the upper triangle are the correlation values. Please use `ggpairs`

for columns 2 and 3 of `river_data`

.

**Exercise 3**

The scatterplot can also be customized. Please change the size of the text on the upper triangle to 8. Hint: Pass a `list(continuous = wrap("cor", par1=value1)) `

to the parameter `upper`

.

**Exercise 4**

Now, please add a tendency line on the scatterplots and change the color of the dots to blue. Hint: `.smooth`

.

**Exercise 5**

Good, the plot looks nice; but, as you can see, the correlation between the precipitation and the level of the river is very low. That is reasonable due to the time that precipitation has occurred to the moment when the level of the river increased due to the runoff over the basin. That is why we also have to estimate the correlation between the variables and its lags. A good example of this is the auto-correlation function, which indicates how much linear correlation exists between the values of the series at a time, “t”, and the values in t-i.

Please use the `ggAcf`

function and plot the auto-correlation function of the `LEVEL`

and the `RAIN`

.

**Exercise 6**

Another common operation on time series is to take a difference of the series `x[t]- x[t-k]`

(in this case k=1) and estimate the auto-correlation function. Please use the ` diff `

function and plot the auto-correlation function for the difference of the `LEVEL`

and the difference of the `RAIN`

.

**Exercise 7**

Please use the ` ggCcf `

function and plot the cross correlation function of the `LEVEL`

and the `RAIN`

.

**Exercise 8**

Another interesting way to explore system properties is creating our own lags decompositions of the time series. To do it, please use this function:

# Generate a laged variable

createLags = function(x, numberOfLags,VarName) {

if (!is.vector(x))

stop('x must be a vector')

if (is.null(VarName))

VarName="x"

```
```

` lags=as.data.frame(x)`

names=paste0(VarName,'(t)')

for(i in 1:numberOfLags){

# generate the lag

lag=c(rep(NA, i), x)[1 : length(x)]

# Stack the lag

lags=cbind(lags,lag)

# Stack the name of the lag

names=cbind(names,paste0(VarName,'(t-',toString(i),")"))

}

# Asign names to the columns

colnames(lags) =names

# trim the first values with nan

return(lags[(numberOfLags+1):length(x),])

}

Please generate a lag decomposition of the `RAIN`

and the `LEVEL`

for the first 5 lags.

**Exercise 9**

Please create one data frame with all the lags `lags_all=cbind(lags_level,lags_rain)`

and generate a pairs plot with `ggpairs`

.