The Ultimate Fighting Championship has come a long way since it’s shaky (and bloody) beginnings in 1993. In this exercise set, we will use R’s web scraping capabilities to retrieve some UFC related data from Wikipedia. More specifically, we will use the
rvest package to scrape the information from Wikipedia. We will then replicate the graph at the bottom of that page, which shows the evolution of the number of UFC events per year.
For a nice introduction to the
rvest package, see here.
Finally, here are the suggested packages for this exercise set:
Answers to the exercises are available here.
Using rvest, read the html page https://en.wikipedia.org/wiki/List_of_UFC_events.
From the html page, extract the ‘Past events’ table and turn it into a data frame.
Looking at the Date column, we see that the format is quite messy; but, we can see that the YYYY-MM-DD information is present.
Using regex, mutate the previous table so that the Date column is characterized with the format YYYY-MM-DD.
Note: I like to use
stringr::str_extract, but base R function
grep will work as well.
Now, transform the character column Date into a proper date format using the
anytime package and create a Year column using the
dplyr to count the number of events per year. Exclude the year “2018.”
Finally, plot a bar chart to show the result using the
ggplot2. For extra-points, try to make the graph nicer by:
– Setting the bar color to blue
– Set the y-axis label to “# Events”
– Show every year in the x-axis and rotate the labels 90 degrees
– Write the event count inside the chart bars in white
– Add a title to the chart