In last few years words embedding became one of the most hot topics in natural language processing. Most famous algorithm in this area is definitely word2vec. In this exercise set we will use wordVectors
package which allows to import pre-trained model or train your own one.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Exercise 1
Install wordVectors
package.
Exercise 2
Explore demo dataset demo_vectors
. Print whole vectors for word ‘good’.
Exercise 3
Find closest words to word ‘good’.
Exercise 4
Find words that are in similar relation with ‘she’ as ‘man’ with ‘he’.
Exercise 5
Find top 50 words closest to ‘bad’ and plot them with PCA method.
Exercise 6
Download collection of cookbooks for model training. Prepare a textfile suitable for the model (txt with tokens separated by spaces).
- Learn how to import and prepare Unstructured text data for modelling,
- feed this data into Natural Language Processing models,
- and much more
Exercise 7
Train word2vec model with 200 dimensions, 12 words window and 5 iterations.
Exercise 8
Find out what beef dish is most similar mutton chops 😉
Exercise 9
Cluster the embeddings using kmeans
and print first 20 words from the cluster containing word ‘cake’.
Exercise 10
Find top 10 most similar words for ‘sweet’ and ‘sour’. Plot them with similarity to ‘sweet’ on X axis and similarity to ‘sour’ on Y axis.
Leave a Reply