Adventures in Machine Learning

A 30 Day Writing Challenge

Josh Sephton
3 min readApr 1, 2017

I’ve neglected this blog; I haven’t written anything since November.

I won’t make excuses. I write for myself more than I write for you, dear reader. Writing helps me organise my thoughts, and codify my beliefs. I’ve wasted an opportunity. For the past 6 months I’ve been learning how to handle big datasets (~100 million records) and build distributed processing pipelines. I could have been writing down everything I’ve learnt but I simply haven’t.

I’m going to change that today. It’s April 1, 2017 and I’m taking part in the 30 Day Writing Challenge. I won’t be publishing everyday, but I’ll be writing everyday.

This challenge happens to coincide with me starting a new job. I’m not going to talk about specifics until I’m settled but I will say that I’m building a machine learning team for a startup based in Birmingham, UK.

I have very little experience in machine learning, so both my new employer and I are taking a leap of faith that I’ll be able to figure it out. We got on well at interview and both felt that we could work together effectively. I’m completely self-taught and I’ve always been a problem solver, so it’s not daunting to be starting a job without the necessary skills to be successful. However, I’ll be learning things far quicker than I have since school and I want to keep a record of it.

In about 2 weeks, it’ll be my first day at my new job. Until then I’m going to be thinking about how I want to work with the data I’ll have available. I’m going to investigate different algorithms and techniques, as well as thinking about the business side of building a new team.

I’m not naïve enough to think I can write a plan now that I’ll stick to in the next few months — that’s completely unrealistic as things will change. I want to be prepared. I want to be in the right mindset for taking on this challenge. I want to have thought about various possibilities so I will at least know how to approach the problems I’ll face.

The heart of my job will be trying to understand the meaning of lots and lots of text. Over the next few posts I’ll be looking at some of the algorithms that I could use. I’ll then spend some time thinking about how I’ll measure success in my new role. Once I start my new job, I’ll journal my progress and whatever challenges arise.

Aside from acting as a record, I want to create a resource for other engineers who want to tackle machine learning. A lot of the articles I’ve read are written by people with PhDs and lots of letters after their names. They use unfamiliar language and talk about mathematical things like Bayesian Classification. It turns out there’s a reasonable explanation for all of it — a Bayesian classifier just gives the probability that an item belongs to a certain group. I want to shed light on techniques we can use to mine data using simple language and as little mathematics as possible.

Come with me over the next 30 days to understand how we can find meaning in large datasets, how we can build successful teams from scratch, and how we can put theory into practice.

This is a post in my 30 Day Writing Challenge. I’m a software engineer, trying to understand machine learning. I haven’t got a PhD, so I’ll be explaining things with simple language and lots of examples.

Follow me on Twitter to see my latest posts. If you liked this article please click the heart button below to share — it will help other people see it.

--

--

Josh Sephton

Founder of Pritchatts Consulting Ltd., making companies more profitable by making their data work for them.