R FOR HYDROLOGISTS
CORRELATION AND INFORMATION THEORY MEASUREMENTS – PART 3
Before we begin, if you don’t have the data, first get it from the first tutorial here. You will also need to Install and load the
Answers to these exercises are available here.
The mutual information quantifies the “amount of information” between the two variables in bits. To transform it into a metric, there has been several variants proposed of the MI; one of those is a normalization that assumes MI as an analog of co-variance and calculated it as a Pearson correlation co-efficient
Please write a function to calculate the normalized mutual information with two input parameters
x,y as vectors and NMI as the return value. Hint: Reuse the code of the last tutorial .
Similar to before, we will estimate the linear auto-correlation function. Also, it is possible to estimate a nonlinear auto-correlation function using the NMI as a correlation co-efficient of the lags of the time series. Please load the function
createLags(x, numberOfLags, VarName) and create the embedded space for the first 400 lags of the
LEVEL and the
To calculate the nonlinear auto-correlation function (NACF), you can estimate the NMI for the first column of
lags_level compared with all the other lags. Do it also for the
To calculate the nonlinear cross correlation function (NCCF), you can estimate the NMI for the first column of
lags_level compared with all the lags of
lags_rain. Do it also for the
lags_rain compared with all the lags of
Another very useful tool of measurement is the the Kullback–Leibler divergence or relative entropy. It measures how one’s probability distribution
q diverges from a second expected probability distribution
p . It is possible to estimate it using the formula: cross entropy of
q respect to
p minus the entropy of
To estimate the probability distribution
q, this time we will change our approach and we will use a
geom_histogram. Please create a histogram of 10 bins from the level and a histogram from the first lag, then assign it to
Hint: 1) Remember to always use the interval from p for the histograms. 2) After grabbing the first layer of data from the plot with
layer_data , you can get use from the column
Now, please calculate the entropy of
Now calculate the cross entropy
Hp_q with the formula
-sum(p*log2(q)). Hint: Remember to avoid negative values of q.
Finally, please calculate and print the Kullback–Leibler divergence.