ProjectPro
Published in

ProjectPro

How to Effectively Plan Your First Machine Learning Project?

A New Standard Has Been Set

Photo by Kelly Sikkema on Unsplash

Doing projects is the one sure way to get your foot in the door of machine learning and data science. Everyone whose serious about getting into the field is aware of this. Consequently, we’ve now got several people building projects, putting them in their Github portfolio, and sharing them across their various networks, but many of them aren’t resonating. Why?

A new standard has been set. The fact you can build a model that predicts something and deploy it using some cloud server is impressive, but most people can do that now. To distinguish yourself from the crowd, you’ve got to do something different. You’ve got to move away from solely thinking about the technical aspects of a machine learning problem and begin to think about things that you’d consider in a real-world environment.

This means you’ve got to put in the extra effort. Think about factors that people aren’t considering, such as:

  • Why am I doing this project?
  • Who is the project going to impact?
  • How will this project benefit the business?
  • etc

Most people want to get a project down as fast as possible. Thinking of the above questions may slow you down, but you’re beginning to emulate a real-world scenario that will demonstrate to hiring managers, that you are ready to join a team immediately.

For the remainder of this article, we are going to work through the entire planning of a business problem.

Table of Contents: 
--> Deciding on a project
--> Building a business case
--> The problem
--> Feasibility
--> Requirements
--> The system design
--> Wrap up

Deciding on a project

The question I hear most often is “what project should I do to get hired?” My answer to people who ask me this is to think about what type of business they’d like to work for. Your project is supposed to capture the attention of ideal companies you’d like to work for — that’s your key performance indicator (KPI).

“If you don’t know what project to work on, it’s because you don’t know who you’re appealing to.”

A key factor in capturing someone's attention is showing them something that’s of interest to them. This means you must have a clearly defined idea of the type of company you’d like to work for. An easy way to narrow it down is to think of the types of problems you’d like to solve. cx

There are two types of companies applying machine learning:

  • Companies building applications with a machine learning as a part of their core offering (i.e. DeepMind, Databricks, MobiDev, etc)
  • Companies applying machine learning to enhance existing workflows (i.e. Amazon, ASOS, Netflix, etc)

Identifying the ideal company you’d like to work for would give you a clear sense of the type of problems that would be good for you to work on. A good idea would be to select 5 different companies you’d love to work for and look at the similar ways in which they’ve employed machine learning to solve business problems and brainstorm 10 project idea’s that may appeal to the companies on your list.

Note: If you’re struggling with project ideas, you can visit ProjectPro Projects for some inspiration; ProjectPro is a platform for learning and reusing enterprise-grade data science and machine learning project templates.

From your list of companies, select one project. You’re now in a better position to begin structuring your project like a real-world project. This is how you stand out. Not only would you get the attention of people you want to impress, but you’ll also learn some valuable lessons about machine learning in the real world.

The Business Case

A business case defines what the problem is and why your product is necessary. The idea behind creating one is to motivate others to recognize the demand for the product or feature to help you get buy-in from other stakeholders.

Whether your project is going to be integrated into a real system or sold as a product does not matter. Doing this in your personal project diverts your focus away from tools and gets you thinking about the actual thing you’re going to be paid to do — solve problems. It also shows you’re extremely collaborative because you’re helping whoever views your project to quickly understand what is going on, and why you’ve decided to do the project.

There are three factors involved in building a business case:

  • The Problem
  • Feasibility
  • Requirements

Let’s dive into each section.

The Problem

The first part of the business case is where you detail the problem; In a real-world scenario, it’s necessary for you to set the scene. You may get someone new who joins your team or someone who’s never seen the problem, and getting them onboard quickly will be beneficial for all members of the team in the long run.

While your personal project may not be conducted in a team, taking the time to do this step will demonstrate your collaborative spirit. Also, clearly articulating the problem at a high level sets the motivations from the beginning, before the project starts. This serves as a guide for all the decisions you’re going to make later down the line for the project.

More people are confined to homes, thus our ecommerce website has generated increased traffic, but this is not reflected in our sales. After some research, I've noticed that this behavior is highly correlated with our pricing hence why I am proposing we adjust the prices of products on our platform in accordance with market and customer data. 

Note: The scenarios in this blog are all hypothetical.

It’s important this step is not skipped if you want your project to stick out like a sore thumb. A business will likely be confronted with several problems at any one time, and just because you’ve spoken about a problem doesn’t mean it’s worth solving immediately.

Thus, your business case should also justify why you’ve decided to do the project you’ve chosen to do as you define the problem. The best way to do this is by showing the relative impact the fulfillment of the project is going to have on the business.

Once again you’re showing you can think in more than one dimension. You’re not solely concerned with model architectures and predictions because you’re thinking based on the interests of the business.

One of our core business values is to put people first - this includes ensuring our products are affordable yet competitive. Therefore, it's of extreme necessity to ensure we meet this core value. Not only would we attract more customers with competitive pricing, but we will also increase our sales growth since more customers bring more sales opportunities. 

To evaluate the final outcome of a machine learning project, there must be a KPI that’s been set beforehand. After you’ve articulated the problem and why it’s an important problem to solve now, you should include the key objective you wish to satisfy with your project. At this point, it can still be quite generalistic, but it will serve as a guiding light for specific KPIs when it comes time to define them.

Feasibility

Once you’ve defined the problem, it could be tempting to leap straight into the machine learning parts. Resist the urge. You should first begin to think of different solutions and approaches you could take to achieve your objective. The odds are that you’re not the first person to approach this problem so look at how others have done it in the past.

Your goal is not to prescribe a solution. You’re only thinking about how the problem can be approached from a broad scale rather than just accepting the first solution that you see. The reason for this is that machine learning is not always necessary, and showing you’ve thought about various avenues emphasizes you’re unwillingness to treat everything like a nail because you’ve got a hammer — you want to use the ideal tools for the problem.

If you can draw then draw. If you can use design tools to illustrate various solutions, use design tools; It helps to have a tangible idea of the project because, in a real-world scenario, it’s still quite possible for stakeholders to have varied understandings of what’s going on.

A tangible draw-up makes it easier to identify different constraints, the feasibility of the project, and how the project may be integrated into the greater system in a real-world scenario.

Another key factor to consider: nobody truly knows the problem like the people who you’re product or feature is going to serve. Take this as a perfect opportunity to put yourself into their shoes (the customer) so that you can empathize with them. The further removed you are from the end-user, the less likely you’re going to understand their perspective.

Usually, this would involve speaking to end-users. If you have the means to carry that out then you should certainly do it, but if not, then make the most of Google to better understand what exactly your end-users want. It’s not the best solution but the fact that you’re thinking about the interests of the business (i.e. end-users, impact on the business, etc) is what’s appealing.

Requirements

The last part of the business case is to describe the core requirements required for your project. This will help you to clearly define the specific deliverables to produce in order to meet each requirement.

Remember, a project is not finished once it’s gone into production. You can cut away aspects of the feature or product that are not necessary for the immediate future and introduce them in a later release. Your main focus should be on the tasks that are most valuable now.

The System Design

In machine learning, system design is an iterative process. This means that the output from one step may be used to update previous steps. For example, you may realize the data you need to solve the initial business case that was defined is inaccessible, thus, you may need to redefine the business case.

It’s still vital to ensure there’s a clear structure at the beginning of your project since this will help you take a systemic approach to build the product.

“Machine learning system design is the process of defining the software architecture, infrastructure, algorithms, and data for a machine learning system to satisfy specified requirements”
Standford CS329S

One of the first things to consider is how you’re going to evaluate how a model performs.

Evaluation

A vital aspect people forget about in their projects is how it links back to the business metrics — mainly because they never thought about the business in the first place. But you’re going to need a way to evaluate how you’re doing in your project.

One of the most challenging aspects of defining your evaluation metrics is going to be relating them back to the business goals. Thus, evaluation metrics are split into two groups: offline and online. Offline evaluation involves benchmarking methods using a labeled dataset. Rather than merely selecting an offline metric, you have to think about what metric can be optimized such that it relates to the core business metrics that you’re concerned with.

Online evaluation is conducted to determine the model's performance with data that are not available during training. In other words, we use the online evaluation metrics to determine how well a model performs with users we don’t know and measure how it's impacting the business and the end-users alike.

Data

Machine learning is driven by data which means it must be treated as a first-class citizen (along with the code and models). Be sure to describe the static and the dynamic sources being used.

Don’t just think about where you’re going to get the data from, you also must consider: where you’re going to store that data, the bias in the data, the privacy concerns about the data, and how you’re going to label the data, etc. This section is a great place to begin articulating those factors.

Wrap up

In the real world, we have deadlines. Organizing the project into a manageable timeline based on the deliverables required is the final step. Since the machine learning workflow is highly iterative, it’s possible for the project to become unnecessarily prolonged which has its own pitfalls in the real world (i.e. concept drift).

Thanks for reading.

If you enjoy reading stories like this one and wish to support my writing, consider becoming a Medium member. With a $5 a month commitment, you unlock unlimited access to stories on Medium. If you use my sign-up link, I’ll receive a small commission.

Already a member? Subscribe to be notified when I publish.

--

--

--

Accelerate your work and career with 150+ Reusable end-to-end Solved Data Science and Machine Learning Projects

Recommended from Medium

Augmenting A Sales Pipeline with Google Analytics and Data Studio

The Data Science Roadmap

Measures of Central Tendency

Get U.S. Energy Information Administration (EIA) Rates With An API

Barriers to Working With Data for Business Users

How to Learn Python for Data Science Online

Do Kaggle properly!

Creating publication-ready sequence logos in R

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Kurtis Pykes

Kurtis Pykes

More from Medium

Data Science And Machine Learning Projects — Mega Compilation Part 4

Ridge and Lasso Regression

A Data Scientist Is More Than Just a Data Scientist

A Guide for Aspiring Data Scientists