The biggest advantage of modules is the ability to efficiently reuse Shiny code, which can save a great deal of time. In addition, modules can help you standardize and scale your Shiny operations. Lastly, even if not reused, Shiny modules can help with organizing the code and break it into smaller pieces – which is very much needed in many complex Shiny apps. Some more information on Shiny modules can be found here.

In the following exercise set, you will practice the not-so-straightforward use of Shiny modules. The first four exercises are a warm-up, and will help you “refresh” on how to build each part of a Shiny module. In each of the last six exercises you will build a complete end-to-end module and run a minimal Shiny app to test it. Answers to the exercises are available here.

Two reminders before we begin:

* A typical UI function would take the argument `id`

and start with the line `ns <- NS(id)`

.

* All input and output IDs within a UI function should be wrapped with the function like `ns()`

.

**Exercise 1**

Build a module-UI-function that provides a `selectInput`

control, where the choices are `LETTERS`

.

**Exercise 2**

Build the corresponding module-server-function, that prints the selected letter to the console.

**Exercise 3**

Build a regular UI object that contains the module-UI-function.

**Exercise 4**

Build a regular server function that calls the module you built in exercises 1 and 2.

Put together a minimal Shiny app that runs everything (e.g, `shinyApp(ui = ui, server = server)`

).

**Exercise 5**

Adjust the module you built to show the selecter letter as a UI `textOutput`

instead of printing it to the console.

**Exercise 6**

In some cases you’ll need the same module twice in a single Shiny app.

Build a minimal Shiny app that uses the module from exercise 5 twice.

**Exercise 7**

Still with the same “letters” module, add an option for the app developer to select the label of the `selectInput`

, with the default value being “Select a letter”. Call the module with a different value for the label than the default.

**Exercise 8**

Build a “contact form” module that contains a name (`textInput`

), a subject (`selectInput`

with dynamic choices, namely the person who uses the module can choose which choices to display) a message (`textAreaInput`

) and a button (`actionButton`

with a dynamic label). Upon clicking the button, all of the form information should be saved in a text file.

**Exercise 9**

Build a module that takes a number *n* using `numericInput`

, and generates *n* elements of type `textInput`

.

**Exercise 10**

Modules can also be nested within each other. Namely, one module can call anther module.

To practice that, create two modules, an “inner” one and an “outer” one.

Your minimal Shiny app should call the “outer” module, that in turn will call the “inner” module.

The module-server-functions should take an argument `text`

, and render it as a `textOutput`

.

- How To Start Plotting Interactive Maps With Leaflet
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- How to Start Plotting Interactive Maps With Leaflet: Exercises
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

Leaflet is a JavaScript library for interactive maps. It is widely used across many platforms and fortunately, it is also implemented as a very user-friendly R package! With leaflet, you can create amazing maps within minutes that are customized exactly to your needs and embed them within your Shiny apps, markdowns or just view them in your RStudio viewer.

In the following set of exercises, we will use the geo-spatial data of the Atlantic Ocean storms in 2005. The data is readily available in the leaflet package under the variable `leaflet::atlStorms2005`

. Each exercise is adding some more features/functionalities to the code of the previous exercise, so be sure not to discard the code until after you’re done with all of the exercises. Answers to these exercises are available here.

For other parts of the series, follow the tag leaflet.

**Exercise 1**

Set the default view to longitude -47.4, latitude 39.75 and zoom level 3.

In addition, add a button that will scale back to the default view upon clicking it.

Hint: use the `icon()`

function from the shiny package to easily render an icon of your choice for the button.

**Exercise 2**

Add the lines that represent the storms traces to the map.

**Exercise 3**

Color each line according to the storm max wind.

**Exercise 4**

Add a legend to the colors you just added.

**Exercise 5**

Upon hovering over a line, change its weight to 10.

**Exercise 6**

Upon hovering over a line, show a label with the storm name.

**Exercise 7**

Upon clicking a line, show a pop-up with the storm minimum pressure.

**Exercise 8**

Save your leaflet map as an HTML file.

Hint: use `htmlwidgets::save()`

.

- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

This exercise is going to be the last exercise on Basic Generalized Linear Modeling (GLM). Please click here to find the other part of the Basic GLM Exercise that you’ve missed.

In this exercise, we will discuss Logistic Regression models as one of the GLM methods. The model is used where the response data is binary (ex. male or female, present or absence) or proportional (ex. percentage and ratio.)

`M1 <- glm(response ~ Predictor1 + Predictor2, family = binomial)`

Data-sets are used based on Polis et al. (1998), which is recorded island characteristics in the Gulf of California. While the analysis is based on Quinn and Keough (2002), the data model presences/absences are of a spider predator against the perimeter to area ratio of the islands.

Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page. Load the data-set here, call it ‘spider’ and load all the required packages before running the exercise.

**Exercise 1**

Visualize the data.

**Exercise 2**

Run the model.

**Exercise 3**

Check for over-dispersion.

**Exercise 4**

Use component+residual plots (crPlots) for further checking on dispersion.

**Exercise 5**

Check influential values.

**Exercise 6**

Check The Cooks Distance and the model summary.

**Exercise 7**

Check residuals.

**Exercise 8**

Plot and predict. Calculate the predicted values based on the fitted model.

**Exercise 9**

Produce a final plot, including the base plot, plot fitted model and 95% CI bands.

**Exercise 10**

Check the odds ratio to estimate the probability of presence, given the unit increases in perimeter or area ratio.

**Exercise 11**

Estimate the R2 value. What can be inferred?

- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

In this exercise, we will continue to solve problems from the last exercise about GLM here. Therefore, the exercise number will start at 9. Please make sure you read and follow the previous exercise before you continue practicing.

In the last exercise, we knew that there was over-dispersion over the model. So, we tried to use Quasi-Poisson regression, along with step-wise variable selection algorithms. Please note, here we assumed there is no influence from the background theory or knowledge behind the data. Obviously, there is no such thing in the real world, but we just use this step as an exercise in general.

Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page. Load the data-set and required package before running the exercise.

**Exercise 9**

Load the package called “MASS” to execute the negative binomial model. Run the package; consider all the explanatory variables.

**Exercise 10**

Check the summary of the model.

**Exercise 11**

Set options in base R, considering missing values.

**Exercise 12**

The previous exercise gave insight that variables 1,3,4,6 or 1,4,6 produce the best model performance. Therefore, refit the model using those variables.

**Exercise 13**

Check the diagnostic plot and generate a conclusion based on if the model gives the best performance.

- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

In this exercise, we will try to handle the model that has been over-dispersed using the quasi-Poisson model. Over-dispersion simply means that the variance is greater than the mean. It’s important because it leads to inflation in the models and increases the possibility of Type I errors. We will use a data-set on amphibian road kill (Zuur et al., 2009). It has 17 explanatory variables. We’re going to focus on nine of them using the total number of kills (TOT.N) as the response variable.

Please download the data-set here and name it “Road.” Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page. Load the data-set and required package before running the exercise.

**Exercise 1**

Doing some plotting, we can see decreasing variability of kills with distance.

**Exercise 2**

Run the GLM model with distance as the explanatory variables.

**Exercise 3**

Add more co-variables to the model and see what’s happening by checking the model summary.

**Exercise 4**

Check the co-linearity using VIF’s. Set options in Base R concerning missing values.

**Exercise 5**

Check the summary again and set base R options. See why we do this on the previous related post exercise.

**Exercise 6**

Check for over-dispersion (rule of thumb, value needs to be around 1.) If it is still greater or less than 1, then we need to check diagnostic plots and re-run the GLM with another structure model.

**Exercise 7**

Restructure the model by throwing out the least significant terms and repeat the model until generating fewer significant terms.

**Exercise 8**

Check the diagnostic plots. If there are still some problems, then we might need to use other types of regression, like Negative Binomial regression. We’ll discuss it in the next exercise post.

- Advanced Techniques With Raster Data – Part 3: Exercises
- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

A generalized linear model (GLM) is a flexible generalization of an ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution.

The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function. It also allows the magnitude of the variance of each measurement to be a function of its predicted value.

GLMs can be split into three groups:

• **Poisson Regression** **–** for count data with no over/under dispersion issues.

• **Quasi-Poisson** or **Negative Binomial Models – **where the models are over-dispersed.

• **Logistic Regression** **Models –** where the response data is binary (ex. present or absent, male or female, or proportional (ex. percentages.))

In this exercise, we will focus on GLM’s that use Poisson regression. Please download the data-set for this exercise here. The data-set investigates the biographical determinants of species richness at a regional scale (Gotelli and Everson, 2002). The main purpose of this exercise is to replicate the Poisson regression of an ant species richness against latitude, elevation and habitat type on their paper.

Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page. Load the data-set and required package before running the exercise.

**Exercise 1**

Load the data and check the data structure using the `scatterplotMatrix`

function. Assess its co-variation and data patterning.

**Exercise 2**

Run a GLM model and run VIF analysis to check for inflation. Pay attention to the col-linearity.

**Exercise 3**

If there are any issues with the co-variation, try to center the predictor variables.

**Exercise 4**

Re-run VIF with the new variables.

**Exercise 5**

Check for any influential data point outliers using influence measures (Cooks distance) and create the plot. If the value is less than 1, then it is OK to go.

**Exercise 6**

Check for over-dispersion. It needs to be around 1 to go to the next step.

**Exercise 7**

Check the model summary. What can we infer?

**Exercise 8**

Since we have lots of variables, we will do model averaging. The first step is to set options in base R regarding missing values. Then, try to asses which variables have a significant influence on the response variable. Here we include latitude, elevation and habitat variables to produce the best model.

**Exercise 9**

Check validation plots.

**Exercise 10**

Produce a base-plot and the points of predicted values.

- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Data Manipulation with data.table (part -2)
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

If knowledge is power, then knowledge of data.table is something of a super power, at least in the realm of data manipulation in R.

In this exercise set, we will use some of the more obscure functions from the data.table package. The solutions will use set(), inrange(), chmatch(), uniqueN(), tstrsplit(), rowid(), shift(), copy(), address(), setnames() and last(). You are free to use more, as long as they are part of data.table. The objective is to get (more) familiar with these functions and be able to call on them in real-life, giving us fewer reasons to leave the fast and neat data.table universe.

Solutions are available here.

PS. If you are unfamiliar with data.table, we recommend you start with the exercises covering the basics of data.table.

**Exercise 1**

Load the gapminder data-set from the gapminder package. Save it to an object called “gp” and convert it to a data.table. How many different countries are covered by the data?

**Exercise 2**

Create a lag term for GDP per capita. That is the value of GDP at the last observation (which are 5 years apart) for each country.

**Exercise 3**

Using the data.table syntax, calculate the GDP per capita growth from 2002 to 2007 for each country. Extract the one with the highest value for each continent.

**Exercise 4**

Save the column names in a vector named “temp” and change the name of the year column in “gp” to “anno” (just because); print the temp. Oh my, what just happened? Check the memory address of temp and names(gp), respectively.

**Exercise 5**

Overwrite “gp” with the original data again. Now make a *copy passed by value* into temp (before you change the year to anno) so you can keep the original variable names. Check the addresses again. Also, change factors to characters and don’t forget to convert to data.table again.

**Exercise 6**

A data.table of the number of goals each team in group A made in the FIFA world championship is given below. Import this into R and add a column with the countries’ population in 2017 to the data.table, rounded to the nearest million.

gA_2014 <- data.table( country = c("Brazil", "Mexico", "Croatia", "Cameroon"), goals2014 = c(7, 4, 6, 1) ) gA_2014 country goals2014 1: Brazil 7 2: Mexico 4 3: Croatia 6 4: Cameroon 1

**Exercise 7**

Calculate the number of years since the country reached $8k in GDP per capita at each relevant observation as accurately as the data allows.

**Exercise 8**

Add a subtly different variable using rowid(). That is the number of the observations among observations where the GDP is below 8k up to and including the given observation. Which country, in each continent, has the most observations above 8k? If there are ties, then list all of the those tied at the top.

**Exercise 9**

Use inrange() to extract countries that have their life expectancy either below 40 or above 80 in 2002.

**Exercise 10**

Now, the soccer/football data from exercise 6 came with goals made and goals made against each team as the following:

gA_2014b <- data.table( country = c("Brazil", "Mexico", "Croatia", "Cameroon"), goals2014 = c("7-2", "4-1", "6-6", "1-9") )

How can you split the goals column into two relevant columns?

(Image by National Museum Wales)

]]>- Parallel Computing Exercises: Foreach and DoParallel (Part-2)
- Parallel Computing Exercises: Snow and Rmpi (Part-3)
- Building Shiny App exercises part 4
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

In the age of Rmarkdown and Shiny, or when making any custom output from your data, you want your output to look consistent and neat. Also, when writing your output, you often want it to obtain a specific (decorative) format defined by the html or LaTeX engine. These exercises are an opportunity to refresh our memory on functions, such as paste, sprintf, formatC and others that are convenient tools to achieve these ends. All of the solutions rely partly on the ultra flexible sprintf(), but there are no-doubt many ways to solve the exercises with other functions. Feel free to share your solutions in the comment section.

Example solutions are available here.

**Exercise 1**

Print out the following vector as prices in dollars (to the nearest cent):

`c(14.3409087337707, 13.0648270623048, 3.58504267621646, 18.5077076398145,`

. Example:

16.8279241011882)`$14.34`

**Exercise 2**

Using these numbers, `c(25, 7, 90, 16)`

, make a vector of filenames in the following format: `file_025.txt`

. Left pad the numbers so they are all three digits.

**Exercise 3**

Actually, if we are only dealing with numbers less than one hundred, `file_25.txt`

would have been enough. Change the code from the last exercise so that the padding is pro-grammatically decided by the biggest number in the vector.

**Exercise 4**

Print out the following haiku on three lines, right aligned, with the help of cat: `c("Stay the patient course.", "Of little worth is your ire.", "The network is down.")`

.

**Exercise 5**

Write a function that converts a number to its hexadecimal representation. This is a useful skill when converting bmp colors from one representation to another. Example output:

tohex(12) [1] "12 is c in hexadecimal"

**Exercise 6**

Take a string and pro-grammatically surround it with the html header tag `h1`

.

**Exercise 7**

Back to the poem from exercise 4, let R convert to html unordered list so that it would appear like the following in a browser:

- Stay the patient course
- Of little worth is your ire
- The network is down

**Exercise 8**

Here is a list of the current top 5 movies on imdb.com in terms of rating `c("The Shawshank Redemption", "The Godfather", "The Godfather: Part II", "The Dark Knight", "12 Angry Men", "Schindler's List")`

. Convert them into a list compatible with the written text.

Example output:

`[1] "The top ranked films on imdb.com are The Shawshank Redemption, The Godfather, The Godfather: Part II, The Dark Knight, 12 Angry Men and Schindler's List"`

**Exercise 9**

Now, you should be able to solve this quickly: write a function that converts a proportion to a percentage that takes as input number of decimal places. An input of 0.921313 and 2 decimal places should return `"92.13%"`

.

**Exercise 10**

Improve the function from the last exercise so that the percentage consistently takes 10 characters by doing some left padding. Raise an error if the percentage already happens to be longer than 10.

(Image by Daniel Friedman).

]]>- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Advanced Techniques With Raster Data – Part 2: Supervised Classification
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

Geospatial data is becoming increasingly used to solve numerous ‘real-life’ problems (check out some examples here.) In turn, R is becoming a powerful open-source solution to handle this type of data, currently providing an exceptional range of functions and tools for GIS and Remote Sensing data analysis.

In particular, **raster data** provides support for representing spatial phenomena by diving the surface into a grid (or matrix) composed of cells of regular size. Each raster data-set has a certain number of columns and rows and each cell contains a value with information for the variable of interest. Stored data can be either: (i) thematic – representing a **discrete** variable, (ex. land cover classification map) or **continuous** (ex. elevation).

The `raster`

package currently provides an extensive set of functions to create, read, export, manipulate and process raster data-sets. It also provides low-level functionalities for creating more advanced processing chains, as well as the ability to manage large data-sets. For more information, see: `vignette("functions", package = "raster")`

. You can also check more about raster data on the tutorial series about this topic here.

In this exercise set, we will explore the following topics in raster data processing and geostatistical analysis (previously discussed in this tutorial series):

- Unsupervised classification/clustering of satellite data
- Regression-kriging (RK)

We will also address how to use the package `RSToolbox`

(link) to calculate the:

- Tasseled Cap Transformation (TCT)
- PCA rotation/transformation

Both data compression techniques examined here will use spectral data from satellite imagery.

Answers to these exercises are available here.

**Exercise 1**

Use the data in this link (Landsat-8 surface reflectance data bands 1-7, for Peneda-Geres National Park – PGNP, NW Portugal) to answer the next exercises (1 to 6). Download the data, uncompress and create a raster brick. How many pixels and layers does the data have?

**Exercise 2**

Make an RGB plot with bands 5, 1, and 3 with linear stretching.

**Exercise 3**

Using k-means algorithm performs an unsupervised classification/clustering of the data with 5 clusters.

**Exercise 4**

Use the CLARA algorithm (package `cluster`

) to perform an unsupervised classification/clustering of the data with 5 clusters and Euclidean distance.

**Exercise 5**

Using package `RStoolbox`

, calculate the Tasseled Cap Transformation of the data (remember it is Landsat-8 data with bands 1-7).

**Exercise 6**

Using package `RStoolbox`

, calculate the standardized PCA transform. What is the cumulative % of explained variance in the three first components?

**Exercise 7**

- Use the data in this link to answer the next exercises (annual average temperature for weather stations in Portugal; col
`AvgTemp`

). Using Lat and Lon columns from the`clim_data_pt.csv`

table, create a`SpatialPointsDataFrame`

object with CRS WGS 1984. - Using Ordinary Kriging from package
`gstat`

, interpolate temperature values employing a*Spherical*empirical variogram. Calculate the RMSE from 5-fold cross-validation (see function`krige.cv`

) and use the`set.seed(12345)`

.

**Exercise 8**

Using the previous question rationale, experiment now with an *Exponential* model. Calculate RMSE also from 5-fold CV. Which one was the best model according to RMSE?

**Exercise 9**

Using the cubist regression algorithm (package `Cubist`

), predict the based `AvgTemp`

on latitude (`Lat`

), elevation (column `Elev`

) and distance to the coastline (column `distCoast`

). Calculate the RMSE for a random test set of 15 observations. Use the `set.seed(12345)`

.

**Exercise 10**

From the previous exercise, extract the train residuals and interpolate them. Following a Regression-kriging approach, add the interpolated residuals and the regression results. Calculate the RMSE for the test set (defined in E9) and check if this improves the modeling performance any further.

]]>Deep learning is under active development. Papers with new approaches are being published every day. In this set of exercises we will go through some of the newer methods that boost the neural network’s performance. By the end of this post, you will be able to train neural networks with adaptive learning rates and apply methods to avoid (overfitting)[https://en.wikipedia.org/wiki/Overfitting], it is recommended to check out the following tutorials before start solving the exercises. : basics part 1, basics part 2,

regression analysis,

classification analysis.

Moreover, a great overview of the algorithms that we will go through at this tutorial can be found here, it is highly recommended to go through this post. It is very likely to be one of the best guides of adaptive learning methods out there.

We will use the ‘mtcars’ built-in dataset for this post. The data set is easy to be trained so we will not use a formal evaluation metric(accuracy) but we will plot the logistic regression so that you can really see the impact of each individual optimization technique.

Before proceeding, it might be helpful to look over the help pages for the `tf$placeholder`

, `tf$zeros`

,`tf$multiply`

, `tf$add`

, `tf$global_variables_initializer`

, `tf$Variable`

, `tf$sigmoid_cross_entropy_with_logits`

, `tf$reduce_mean`

`, `

`tf$train$GradientDescentOptimizer`

,

`tf$train$MomentumOptimizer`

, `tf$train$AdamOptimizer`

, `tf$train$AdadeltaOptimizer`

, `tf$train$AdadeltaOptimizer`

, `tf$train$RMSPropOptimizer`

, `tf$train$AdamOptimizer`

.

```
```Answers to the exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1

Split the data set into training and testing set. The 80% should be the training data and the rest 20% should be the test data.

Create the placeholders, the parameters (initialize them at 0), the initialization operation, the logit, and evaluate the model using mean cross entropy.

Exercise 2

Train the model using gradient descent algorithm with learning rate 0.01. Plot the results and see how it performs.

Exercise 3

Train the model using momentum update algorithm with learning rate 0.01 and momentum of 0.1. Plot the results and see how it performs.

Exercise 4

Train the model using momentum update algorithm with learning rate 0.01 and momentum of 0.9. Plot the results and see how it performs.

Exercise 5

Train the network using the nesterov momentum update. Does it perform better?

Exercise 6

Use the Adagrad algorithm with learnign rate 0.01, beta1 term 0.9, beta2 term 0.999 and epsilon 1e-08 (recommended hyperparameters). Bear in mind that Adagrad is considered to be a very aggressive algorithm.

Exercise 7

Use the Adadelta algorithm with learnign rate 0.01, decay term 0.1 and epsilon 1e-08.

Exercise 8

Use the Adadelta algorithm with learnign rate 0.01, decay term 0.9 and epsilon 1e-08.

Exercise 9

Use the RMSprop algorithm with learnign rate 0.01, decay 0.9, momentum term 0.1 and epsilon 1e-10.

Exercise 10

Use the Adam algorithm with learning rate 0.01, decay 0.9, momentum term 0.1 and epsilon 1e-10.

**Disclaimer:** Plotting is not a best practice to test the goodness of fitness, but we do it because it is a very simple way to see the difference and helps you understand the difference betweeen the models.

]]>

```
```https://www.r-exercises.com/2018/04/01/tensorflow-neural-network-training-exercises/feed/
0

```
```

```
```

```
```