Classifying success of kickstarter projects using PySpark and TensorFlow.

Introduction

Backing inventors on Kickstarter has for me, in 99 % of the cases, lead to years of waiting for a product that just never shows up at my doorstep. So, let’s once and for all remedy this with an amazing kickstarter success classifier. We will use the MLOps platform to our ingest our datasource (you can find it here on Kaggle), then develop our preprocessing and model scripts locally using the SDK, to finally submit the whole package to our AWS account to get it versioned and ready for production. After all, this could be a real money maker.

Preprocessing with PySpark

Some might argue that using Spark for a 50 MB dataset might be a bit overkill. But I like consistency and improvement. After all, it works just as good for MB as TB. …


Classifying movie ratings using the MLOps platform with PySpark and TensorFlow.

Introduction

Classifying movies is always super cool and useful. I, myself, have built at least five successful business around it, so I wanted to share an end-to-end example of how can go from being an average Netflix rater to making millions of dollars on your skillset. First, we will go through the preprocessing using PySpark on the MLOps platform, we will then continue training awesome models that we can deploy so that millions of users can pay for the ratings.

Note! Subscription tiers at about $29/month have worked best for me in the past.

Inspecting the data

The dataset can be found here. I’m going to go with the ratings, metadata and keywords for this classifier. The columns that I’m interested in and will work with…


Don’t be hard on yourself, only 13% go in to production.

Image for post
Image for post
Source : Unsplash

What’s your ML ROI?

Machine learning is great, it opens doors to new ventures, solving new challenges, as well as helping already established companies reinvent their businesses. But are you really doing that? And if so, what’s the cost? Is your revenue bump justifiable to the huge increase in development costs?

50 % of companies, spend between 30–90 days getting a single model into production. Meanwhile, the data scientist is one of the priciest employees, together with their sparring partners — the data engineers.

Imagine

If your “normal” software engineering team was pushing features at a 90 days interval in 2020, completely unacceptable. And the same should go for data science. The tooling available today, allows any data driven company to mock, PoC and deploy new ideas at an almost daily rate. And let’s face it, if your company is in the business of making gold out of data, then your Data engineers, Data scientists and ML engineers could, would and should be your most valuable asset, and they should really be titled Product engineers, as their work is directly consumed by your users. …

About

Petter Hultin Gustafsson

Petter is the founder and CEO of MLOps, a plug and play solution for end to end machine learning. He has previously co-founded Growbotics and Tracy Trackers.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store