The Ultimate Fighting Championship has come a long way since it’s shaky (and bloody) beginnings in 1993. In this exercise set, we will use R’s web scraping capabilities to retrieve some UFC related data from Wikipedia. More specifically, we will use the rvest
package to scrape the information from Wikipedia. We will then replicate the graph at the bottom of that page, which shows the evolution of the number of UFC events per year.
For a nice introduction to the rvest
package, see here.
Finally, here are the suggested packages for this exercise set: rvest
, dplyr
, magrittr
, stringr
, anytime
, lubridate
, ggplot2
.
Answers to the exercises are available here.
Exercise 1
Using rvest, read the html page https://en.wikipedia.org/wiki/List_of_UFC_events.
Exercise 2
From the html page, extract the ‘Past events’ table and turn it into a data frame.
Exercise 3
Looking at the Date column, we see that the format is quite messy; but, we can see that the YYYY-MM-DD information is present.
Using regex, mutate the previous table so that the Date column is characterized with the format YYYY-MM-DD.
Note: I like to use stringr::str_extract
, but base R function grep
will work as well.
Exercise 4
Now, transform the character column Date into a proper date format using the anytime
package and create a Year column using the lubridate
package.
- Work with an API connection to Twitter
- Learn how to perform a sentiment analysis in R
- And much more
Exercise 5
Use the dplyr
to count the number of events per year. Exclude the year “2018.”
Exercise 6
Finally, plot a bar chart to show the result using the ggplot2
. For extra-points, try to make the graph nicer by:
– Setting the bar color to blue
– Set the y-axis label to “# Events”
– Show every year in the x-axis and rotate the labels 90 degrees
– Write the event count inside the chart bars in white
– Add a title to the chart
The answer link doesn’t work.
@Kevin: You can use this link until the official link gets updated.