GMC AI: Our first ML model

At GoMyCode and every weekend, we have an AI class. This class belongs to the weekend program (other programs include web and gaming).

This AI class is our first promotion, we have 15 people that are really passionate about learning AI and ML.

We divided the program into 3 levels from starting up to advance topics.

After 4 weeks of introductions: starting with python and its library (numby, pandas, matplotlib etc.) and some history about AI; yesterday (4 March 2018) we wrote our first machine learning model. A Modal than can cluster blog’s posts so that can be used later on to make recommendations about what to read next.

This model is an implementation of K-Means algorithm which is an unsupervised machine learning algorithm.

Using Scikit-learn writing this kind of program is really easy, all we needed to do is prepare data then feed it to the algorithm.

1- Read data

We have a folder which contains a list of posts. we use the os module to read from this files and create a list.

import os
posts = [open(os.path.join("./posts", f)).read() for f in os.listdir("./posts")]

2- Transform data

KMeans doesn’t understand words, so we need to transform this words into a numerical format.

One way to do this, is to create a dictionary of words then for each post we create a vector that contains the count of words on it.

Example:

if our dictionary is [“machine”, “learning”, “ai”, “understand”,“database”, “easy”, “hard”, “to”]

and we have this post: “machine learning is easy to understand”

the result vector is [1, 1, 0, 1, 0, 1, 0, 1]

which means: we have 1 occurrence of word machine, 1 of learning, 0 of ai…

Let’s code this. sklearn (as always) make it easy:

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(min_df=1)#create dictionary
vectorizer.fit_transform(posts)#transform posts
transformed_posts = vectorizer.transform(posts)

3- Create Model

We will divide our posts into 2 categories.

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=2, random_state=0).fit(transformed_posts)

4- Predict

Now if we have a post we can predict in which category it belongs

new_post = "image into database."
new_post_vec = vectorizer.transform([new_post])
print(kmeans.predict(new_post_vec))

That was an easy first model, we will discover in the next classes more algorithms and we will finish the level 1 with anintroduction to neural networks and deep learning.

We hope at the end of this session that the students have a good understanding of the classes of ML and start using it as the hardest thing in ML is to start doing it every thing after that is easier (not by much).

At @GoMyCode we are using AI to disrupt the way we learn and teach Computer science. We are always looking for passionate developers to grow our team. Send us an email to yahya@gomycode.tn if you are interested in joining our team.

GMC AI: Our first ML model

1- Read data

2- Transform data

3- Create Model

4- Predict

Written by Mokhles El Heni