The Internet is full of interesting data, there’s no doubt about it. Some sites, such as Twitter, provide users with systemized access (API) around which some neat R packages have been built. In this exercise set, we practice much more general techniques of extracting/scraping data from the web directly, using the rvest package. Note […]

## Easy Web Scraping With Rvest: Solutions

Below are the solutions to these exercises on easy web scraping with Rvest. Note that some of the websites scrape in this exercise set might change over time, therefore the validity of these solutions. #################### # # # Exercise 1 # # # #################### library(rvest) url <- paste0(“https://ocw.mit.edu/courses/”, “electrical-engineering-and-computer-science/”, “6-006-introduction-to-algorithms-fall-2011/lecture-notes/”) ln_page <- read_html(url) #################### # […]

## Text Data Wrangling Exercises: Solutions

Below are the solutions to these exercises on Text Data Wrangling. #################### # # # Exercise 1 # # # #################### hs <- readLines(“https://textfiles.com/stories/hansgrtl.txt”) #################### # # # Exercise 2 # # # #################### hs <- unlist(strsplit(hs, ” “)) #################### # # # Exercise 3 # # # #################### hs <- tolower(hs) #################### # # […]

## Text Data Wrangling: Exercises

In a previous exercise set, we practiced retrieving data from Twitter. In this exercise, we start getting comfortable with manipulating text data. We will start by refreshing our memory on how to use some base-R functions, then we start using the tm package. Answers to the exercises are available here. Exercise 1 Use readLines to […]

## Protected: Bonus: Obtaining Twitter Data Solutions

There is no excerpt because this is a protected post.

## Protected: Bonus: Obtaining Twitter Data Exercises

There is no excerpt because this is a protected post.

## Logistic regression in R solutions

Below are the solutions to these exercises on Logistic regression in R. #################### # # # Exercise 1 # # # #################### library(MASS) library(ggplot2) train <- rbind(Pima.tr, Pima.tr2) test <- Pima.te train$type <- as.integer(train$type) – 1L test$type <- as.integer(test$type) – 1L # Missing values? sapply(train, function(x) sum(is.na(x))) ## npreg glu bp skin bmi ped age […]

## Logistic regression in R

Logistic regression is a modelling approach for binary independent variable (think yes/no or 1/0 instead of continuous). It is used in machine learning for prediction and a building block for more complicated algorithms such as neural networks. In social sciences and medicine logistic regression is widely used to model causal mechanisms. We will use a […]

## Protected: Bonus: Basic Data Exploration in R Solutions

There is no excerpt because this is a protected post.

## Protected: Bonus: Basic Data Exploration in Base R Exercises

There is no excerpt because this is a protected post.