In last few years words embedding became one of the most hot topics in natural language processing. Most famous algorithm in this area is definitely word2vec. In this exercise set we will use
wordVectors package which allows to import pre-trained model or train your own one.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Explore demo dataset
demo_vectors. Print whole vectors for word ‘good’.
Find closest words to word ‘good’.
Find words that are in similar relation with ‘she’ as ‘man’ with ‘he’.
Find top 50 words closest to ‘bad’ and plot them with PCA method.
Download collection of cookbooks for model training. Prepare a textfile suitable for the model (txt with tokens separated by spaces).
Train word2vec model with 200 dimensions, 12 words window and 5 iterations.
Find out what beef dish is most similar mutton chops 😉
Cluster the embeddings using
kmeans and print first 20 words from the cluster containing word ‘cake’.
Find top 10 most similar words for ‘sweet’ and ‘sour’. Plot them with similarity to ‘sweet’ on X axis and similarity to ‘sour’ on Y axis.