Text mining can be messy. Tokenization, document-term matrices, lexicons… Lots of data structures and transformations between them. Fortunately, there is the
tidytext package, which will help you to tidy this mess!
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
dplyr packages. Load Jane Austen`s books and examine how the data looks.
Add a column identifying the line in a book. Split text into words.
Create a wordcloud of 100 most frequent words. Hint: Use
Find the most frequent words for each book.
Perform TF-IDF transformation of the books. Find the words with the highest TF-IDF score for each book.
Create a document-term matrix.
Use Bing sentiment lexicon and calculate the overall sentiment of each book.
Use NRC sentiment lexicon and calculate how the emotions vary over time in each book. Display results with a plot.
Count the sentences in each book.
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory
- Subscribe to receive weekly updates and bonus sets by email
- Share with your friends and colleagues using the buttons below