Tom Gallacher
Jul 8 · 3 min read
https://www.pexels.com/photo/glass-bottles-on-shelf-1771809/

Engaging in conversation is one of the core principles when building and maintaining a remote culture! It is then intuitive to say that certain jargon, abuse or harassment, even if not originally said with a malicious intent, stop people from contributing to conversations, or even worse, stop them from engaging altogether, this can also be extended to counter-productive vocabulary such as the use of negatively associated words and poor machine translations. Having something that can provide insight into what others might interpret, especially in a multi-lingual company is a valuable tool, and is a great way to keep the conversations about inclusivity relevant.

Google’s Jigsaw has been working on this problem for a while, and have built https://www.perspectiveapi.com to help moderation teams and commenters to get real-time feedback on the quality of their conversations. As part of this project, Jigsaw has been engaging with the Kaggle Data Science community by publishing labelled datasets, which means us mere mortals can now explore and begin to build our own models!

The model is based on the Jigsaw dataset, which was obtained by asking people to rate internet comments on a scale from “Very toxic” to “Very healthy”. Jigsaw then defined “toxic” as… “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.”

We have a #yelling channel, it is great :D And our ML Bot is using Slack reactions!

We decided to take this dataset as an exercise to see what we could produce with minimal effort, and implement it in a fun an exciting way! As you might have seen in a previous post, we have been using Uber’s Ludwig as a 0 “code” alternative to building ML models. Tensorflow and Keras are great, but they aren’t easy to prototype and teach!

The first step is to install our dependencies, we are running on GPU instances to speed up learning, but don’t be scared, google provide free GPU machines for ML research at https://colab.research.google.com! Which is AWESOME!

#!/usr/bin/env python3.7pip install ludwig
pip install tensorflow-gpu==1.13.1
python -m spacy download en

Once installed, our next step is to build a model, this sounds scary, but in essence, we need to declare our inputs in our training dataset and map them to outputs.

comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate

Using the header from our training.csv file we can start building and testing models using Ludwig’s YAML configuration format.

input_features:
- name: comment_text
type: text
level: word
encoder: parallel_cnn
preprocessing:
lowercase: true
output_features:
- name: toxic
type: binary
- name: severe_toxic
type: binary
- name: threat
type: binary
- name: insult
type: binary
- name: obscene
type: binary
- name: identity_hate
type: binary

And in its most simplistic way, this will give you a usable model that we can begin to integrate with!

As our model is in python, we decided to just go ahead and write the bot in python — and from what you can see from below, it is super simple!

And just like that, we have a slackbot that runs against our model and returns an emoji reaction and the response time? Just over 1 second :D Which I think isn’t too bad for a PoC.

YLD Blog

YLD's latest thoughts on Software Engineering, Design, leadership and Digital Products

Tom Gallacher

Written by

🏎️🛣️🚴👨‍💻🍺. Linux performance privateer & DevOps extraordinaire, Often seen happily writing code without any knowledge of his surroundings. @YLDio

YLD Blog

YLD Blog

YLD's latest thoughts on Software Engineering, Design, leadership and Digital Products

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade