By: Jenna Jones and Nick Paine
How does a computer learn to interpret or even understand human language? This is what we set out to understand when we began researching Natural Language Processing (NLP). One simple but effective tool is Word2Vec, an algorithm created by Mikolov et al in 2013.
Word2Vec takes a large body of text, which can be in any language, and creates a vector representation of words and the context of those words in relation to each other. …
By: Nick Paine and Jenna Jones
Recently, unsupervised learning has received a lot of attention due to its ability to find patterns in data without any training. Specifically, clustering has been used to solve many data problems, including customer segmentation, fraud detection, recommendation engines and most importantly, football player positions. In this paper, we discuss and compare two different methods for grouping data points together: K-Means and Gaussian Mixture Models (GMM). We then use each of these methods to cluster the same example set of data.
As always with machine learning systems, some special language is needed: