7 things to consider while outsourcing Machine Learning projects for startups and enterprises

Darshan Sonde
Someshwara Innovation
4 min readSep 3, 2021

Understanding business needs and resource optimization around ML requirements is highly skill-based work. It's like painting the Sistine chapel or a beautiful landscape. You need a lot of skilled resources for a short duration to time to execute effectively.

Here are some things to consider before embarking on an ML journey to ensure your success.

Don’t think there is no DATA

Everyone thinks you need lots of data for machine learning, but it isn’t the case for all business problems. Identify the nature of the problem first. A basic understanding of the nature of the problem will reveal that it might be a simple regression problem or algorithm problem where scientific methods will be more suitable.

We had the NLP processing for an Industrial firm where we had to identify the category of master data and then probable duplicates, simple NLP processing coupled with engineering was enough to suitably identify the potential entry of duplicate categories.

Data Collection Strategy

Plan to have a strong strategy on how to collect the data, how to label the data, and clean the data from the start.

Labeling and Annotating data already in the system is a significant task at hand. Microsoft spent millions in labeling the COCO dataset and building the tools around processing the labels.

Having engineering build process and tools around the collecting and cleaning of the dataset from the beginning helps achieve faster turnaround times for the ML teams and potentially save large costs in labeling.

Metrics as Deliverable Milestones

Establish metrics as deliverables. In software at the end of a sprint, we have a shippable deliverable product increment. Once a feature is completed, it only regresses by small amounts.

Applying this to ML projects, we establish an end-to-end pipeline at the beginning of the project and create metrics as the deliverables. Each increment we deliver on the accuracy of the pipeline.

Alpha: 70%
Beta: 80%
Prod: 90%

Establish a basic guideline on what metrics are to be established as deliverables.

ML is not Agile

ML field is still very nascent as an Industry. Our mental models are very tuned to waterfall and agile. From estimation to milestones, management always tries to fit the ML team lifecycle into the agile lifecycle. Even though the end product is a measurable change in product. Do not plan dependencies directly with agile teams. The ML teams tend to get very demotivated as progress is not quantifiable like a sprint burndown.

Trying to evaluate ML projects in the same context as Software would lead to large planning errors. Plan ML projects with strong SME’s and ML executives as iterative progressions with different quality gates and acceptance criteria.

Remember that the ML project is a marathon and not a sprint.

Warranty

There is a large investment that goes into building ML projects. The results of training and evaluating on train data aren’t sufficient to prove that the ML project would succeed in production data.

We’ve had lots of calls of disappointed clients who had disastrous results from other companies post-deployment. The Key is iteration after deployment. There is a brief window post-deployment where the model still needs to be actively iterated. Budgets completely planned like agile would leave no room for services post-launch.

As a Client, Make sure to take warranties on the performance of ML models post-deployment for a good period of time to ensure that the services company provides a working model in production data and not just on a test or synthetic data.

As a services company, plan for a good period of time post-deployment to actively develop the models. Educate your clients on long-term gains and implications of data on model performance.

Testing

ML testing is often ignored or under-planned, but is a crucial part of the ML lifecycle. Ensuring the testing team understands the nuances of ML to be able to test for domain variations.

Bias and Diversity is a strong consideration that has to be considered while training. Human labeled data inherently will contain bias, sometimes simple adversarial attack testing from teams who have a strong understanding of ML shortcomings can help identify potential weakness and bias.

Google photos once tagged an african american person as a gorilla. Understanding the inner workings of ML models and testing it is essential to uncovering such flaws. Equality, Diversity, and Bias are to be considered seriously during the entire lifecycle of the project.

Ensure proper planning is done for testing models.

Software Integration

ML at the end is integrating into a core software product. Make sure that the ML integration with software is considered strongly. Integration is one of the big bottlenecks which people don’t consider.

ML systems may use lots of resources and the modules need to be developed to handle the behaviours of ML systems. Depending on the problem, the ML system might be batched or streamed. It might take anywhere from seconds to hours to complete its run. MLOps is absolutely essential for the integration of complex systems and no manual tracking of the model should be present. All things should be configurable and tracked in a structured manner. (

Planning with a strong understanding of deployment at the beginning of the project ensures that the existing systems can accommodate the ML integration.

By,
Someshwara Software, Machine Learning Division

Subscribe to the publication to get notified of the article on MLOps tools. Months of RND on comparing MLOps tooling and pipeline setup, you won’t be disappointed.

--

--