eXtreme Gradient Boosting is a machine learning model which became really popular few years ago after winning several Kaggle competitions. It is very powerful algorithm that use an ensemble of weak learners to obtain a strong learner. Its R implementation is available in xgboost package and it is really worth including into anyone’s machine learning […]

## Hacking statistics or: How I Learned to Stop Worrying About Calculus and Love Stats Exercises (Part-9)

Statistics are often taught in school by and for people who like Mathematics. As a consequence, in those class emphasis is put on leaning equations, solving calculus problems and creating mathematics models instead of building an intuition for probabilistic problems. But, if you read this, you know a bit of R programming and have access […]

## Big Data Analytics with H20 in R Exercises -Part 1

We have dabbled with RevoScaleR before , In this exercise we will work with H2O , another high performance R library which can handle big data very effectively .It will be a series of exercises with increasing degree of difficulty . So Please do this in sequence . H2O requires you to have Java installed […]

## Answer probability questions with simulation (part-2)

This is the second exercise set on answering probability questions with simulation. Finishing the first exercise set is not a prerequisite. The difficulty level is about the same – thus if you are looking for a challenge aim at writing up faster more elegant algorithms. As always, it pays off to read the instructions carefully […]

## Bonus: Comparing Vector Exercises

We just added this week’s set of bonus exercises! Bonus exercises are weekly exercises sets, available to subscribers to our weekly newsletter. Please sign up (for free!), and receive further details by email how to get access to the bonus exercises (and solutions, of course). This weeks bonus exercise set has a focus on comparing […]

## R with remote databases Exercises (Part-2)

This is common case when working with data that your source is a remote database. Usual ways to cope this when using R is either to load all the data into R or to perform the heaviest joins and aggregations with SQL before loading the data. Both of them have cons: the former one is […]

## Generalized linear functions (Beginners)

On this set of exercises, we are going to use the lm and glm functions to perform several generalized linear models on one dataset. Since this is a basic set of exercises we will take a closer look at the arguments of these functions and how to take advantage of the output of each function […]

## Applying machine learning algorithms – exercises

INTRODUCTION Dear reader, If you are a newbie in the world of machine learning, then this tutorial is exactly what you need in order to introduce yourself to this exciting new part of the data science world. This post includes a full machine learning project that will guide you step by step to create a […]

## Probability functions advanced

In this set of exercises, we are going to explore some applications of probability functions and how to plot some density functions. The package MASS will be used in this set. Note: We are going to use random numbers functions and random processes functions in R such as runif. A problem with these functions is […]

## Data wrangling : Cleansing – Regular expressions (3/3)

Data wrangling is the process of importing, cleaning, and transforming raw data into actionable information for analysis. It is a time-consuming process that is estimated to take about 60-80% of analysts’ time. In this series, we will go through this process. It will be a brief series with the goal of crafting the reader’s skills […]