INTRODUCTION Plotly’s R graphing library makes interactive, publication-quality web graphs. More specifically it gives us the ability to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, and 3D charts. In this tutorial we are going to make a first step in plotly’s world by learning to […]

## Data wrangling : Cleansing – Date

Data wrangling is the process of importing, cleaning, and transforming raw data into actionable information for analysis. It is a time-consuming process that is estimated to take about 60-80% of analysts’ time. In this series, we will go through this process. It will be a brief series with the goal of crafting the reader’s skills […]

## Working with air quality and meteorological data Exercises (Part-4)

Atmospheric air pollution is one of the most important environmental concerns in many countries around the world, and it is strongly affected by meteorological conditions. Accordingly, in this set of exercises we use openair package to work and analyze air quality and meteorological data. This packages provides tools to directly import data from air quality […]

## eXtremely Boost your machine learning Exercises (Part-1)

eXtreme Gradient Boosting is a machine learning model which became really popular few years ago after winning several Kaggle competitions. It is very powerful algorithm that use an ensemble of weak learners to obtain a strong learner. Its R implementation is available in xgboost package and it is really worth including into anyone’s machine learning […]

## Hacking statistics or: How I Learned to Stop Worrying About Calculus and Love Stats Exercises (Part-9)

Statistics are often taught in school by and for people who like Mathematics. As a consequence, in those class emphasis is put on leaning equations, solving calculus problems and creating mathematics models instead of building an intuition for probabilistic problems. But, if you read this, you know a bit of R programming and have access […]

## Big Data Analytics with H20 in R Exercises -Part 1

We have dabbled with RevoScaleR before , In this exercise we will work with H2O , another high performance R library which can handle big data very effectively .It will be a series of exercises with increasing degree of difficulty . So Please do this in sequence . H2O requires you to have Java installed […]

## Answer probability questions with simulation (part-2)

This is the second exercise set on answering probability questions with simulation. Finishing the first exercise set is not a prerequisite. The difficulty level is about the same – thus if you are looking for a challenge aim at writing up faster more elegant algorithms. As always, it pays off to read the instructions carefully […]

## Bonus: Comparing Vector Exercises

We just added this week’s set of bonus exercises! Bonus exercises are weekly exercises sets, available to subscribers to our weekly newsletter. Please sign up (for free!), and receive further details by email how to get access to the bonus exercises (and solutions, of course). This weeks bonus exercise set has a focus on comparing […]

## R with remote databases Exercises (Part-2)

This is common case when working with data that your source is a remote database. Usual ways to cope this when using R is either to load all the data into R or to perform the heaviest joins and aggregations with SQL before loading the data. Both of them have cons: the former one is […]

## Generalized linear functions (Beginners)

On this set of exercises, we are going to use the lm and glm functions to perform several generalized linear models on one dataset. Since this is a basic set of exercises we will take a closer look at the arguments of these functions and how to take advantage of the output of each function […]