The first thing you should do when you start working with new data is to explore it to learn what’s in there. The easiest way to do this is by visualization. Distributions, point plots, etc. They are very helpful, but plotting all of them for each variable or pair of variables can be time-consuming. That’s […]

## Regression Model Assumptions Exercises

You might fit a statistical model to a set of data and obtain parameter estimates. However, you are not done at this point. You need to make sure the assumptions of the particular model you used were met. One tool is to examine the model residuals. We previously discussed this in a tutorial. The residuals […]

## How to use basic dplyr functions

INTRODUCTION The dplyr is an R-package that is used for transformation and summarization of tabular data with rows and columns. It includes a set of functions that filter rows, select specific columns, re-order rows, adds new columns and summarizes data. Moreover, dplyr contains a useful function to perform another common task, which is the “split-apply-combine” […]

## Working With Air Quality and Meteorological Data Exercises (Part 5)

Atmospheric air pollution is one of the most important environmental concerns in many countries around the world; it is strongly affected by meteorological conditions. In this set of exercises, we will use the openair package to work and analyze air quality and meteorological data. This package provides tools to directly import data from air quality measurement […]

## dplyr Non-Standard Evaluation Exercises

dplyr is a great package for interactive data wrangling and exploration. One of key aspects that makes it so great is that it uses non-standard evaluation so a user does not have to repeat data frame name and quote names all the time. On the other hand this feature makes programming with dplyr a non-trivial […]

## Plotly basic charts – exercises

INTRODUCTION Plotly’s R graphing library makes interactive, publication-quality web graphs. More specifically it gives us the ability to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, and 3D charts. In this tutorial we are going to make a first step in plotly’s world by learning to […]

## dplyr basics: More smooth data exploration

Anywhere you look at R code these days, dplyr seems to be there – indeed data indicate that its popularity is growing relative to many common R packages. Influential data scientists have recommended that beginners start “from scratch with the dplyr package for manipulating a data frame” leaving for later standard R subsetting and loops. […]

## eXtremely Boost your machine learning Exercises (Part-2)

eXtreme Gradient Boosting is a machine learning model which became really popular few years ago after winning several Kaggle competitions. It is very powerful algorithm that use an ensemble of weak learners to obtain a strong learner. Its R implementation is available in xgboost package and it is really worth including into anyone’s machine learning […]

## Regression Model Assumptions Tutorial

Regression is used to explore the relationship between one variable (often termed the response) and one or more other variables (termed explanatory). Several exercises are already available on simple linear regression or multiple regression. These are fantastic tools that are used frequently. However, each has a number of assumptions that need to be met. Unfortunately, […]

## Udemy Sale Ends Tomorrow: Any R Course for $12

Until tomorrow, all courses offered by Udemy are on sale for just $12 (regular prices for their 63 R courses vary between $20 and $200). Udemy R courses are very popular: A course like R Programming A-Z™: R For Data Science With Real Exercises! has already been taken by >19,000 online students. Still, perhaps a […]