AI-Labeling Crowdsourcing Platforms

Marcel Boer
Jul 26 · 9 min read
Image for post
Image for post
AI-labeling crowdsourcing platforms — a new digital business model

Artificial intelligence (AI) is widely used in today’s business such as for data analytics, natural language processing, or process automation. The inclusion of artificial intelligence bits and pieces into digital business models creates value by improving back-office efficiency and increasing customer experience. The emergence of artificial intelligence is based on decades of research for solving difficult computer science tasks and is now rapidly transforming business model innovation. Companies that are not considering artificial intelligence will be vulnerable to those companies that are equipped with artificial intelligence technology. While companies like Google, Amazon, and Tesla have already innovated their business models with artificial intelligence, medium and small caps have limited budgets for putting much effort into setting up such capabilities. One high-effort task in creating artificial intelligence services is the pre-processing of data and the training of machine learning models. To meet the speed of the market it most often is not enough to set up internal capabilities to perform the pre-processing. Google for example makes use of a very pragmatic solution — the task of data labeling and validation for their machine learning models are outsourced to all those who are Google users. Have you ever thought about the aim of Google Captcha? Sure, it is used to pretend bots from intruding applications but besides this, daily, millions of users are part of the Google analytics pre-processing team which are validating machine learning algorithms — for free. If you are not one of the Googles out there you might be interested in how you can meet the rising artificial intelligence needs.

Data Labeling for Machine Learning

Machine learning involves using algorithms to learn how to solve a specific task by relying on patterns from sample data whether it is from training or practice. As there are several approaches on how to perform machine learning, supervised learning approaches heavily rely on labeled data to create machine learning models. The following examples highlight use cases with the need for labeling huge amounts of data:

  • Autonomous driving with the need for identifying pedestrians, vehicles, and traffic lights
  • Service desks requests with the need for urgency classification before involving humans
  • Quality inspection of production products for waste determination
  • Personal assistance systems for understanding conversation contexts

Data scientists spend about 80% of their efforts on pre-processing data and labeling data for training scenarios. Only 20% of the effort is put into building machine learning models. this is the reason why crowdsourcing platforms that take care of the repetitive tasks for labeling data arose. Initially labeling data in-house requires hiring employees and gives the advantage to have a transparent labeling process by knowing the people who perform the labeling. Rather than doing in-house labeling, crowdsourcing platforms allow companies to distribute thousands of tasks and easily maximize the return on investment by having operational expenditure based on the needed demand.

Crowdsourcing Pattern

The crowdsourcing pattern targets the solutions of human tasks by adopting an internet crowd which, on the one hand, is a scaling workforce and, on the other hand, is more flexible regarding required qualifications. In exchange for its services a contributor receives a small reward per task or has the change to win a one-time recognition. According to Gassmann et al. (), the Crowdsourcing pattern is often used to foster innovative technology and business ideas. One example can be given by reflecting Procter & Gamble’s product development. Procter & Gamble collaborates with external crowds to explore innovative solutions for product packages, designs, and marketing. This example shows that external crowds can deeply be integrated into the internal product development process. Regarding an AI-service development, the crowd may support the labeling of data to put the focus of internal data scientists rather on the development of the value-adding machine learning models than repetitive data pre-processing.

Crowdsourcing Provider

Addressing the need to outsource simple tasks, crowdsourcing companies offer services to distribute the work to a virtual distributed workforce. The use cases and features they support vary from data processing, creative design tasks, translation requests to any use case in which you can train the crowd on yourself. Known crowdsourcing providers are Amazon Mechanical Turk, MicroWorkers, ClickWorker, MicroTask, and Scale. The following list shows critical factors that shall be considered when selecting a crowdsourcing provider:

  • Maturity: The maturity of the crowdsourcing providers gives insights if the solution has the level of availability and robustness for a reliable service operation.
  • Use Case: According to their main services, platforms have specified on different use cases and serve all its peculiarities.
  • Technology: The crowdsourcing platform shall offer a machine-readable interface to interact on an automated basis. Moreover, the technical foundation shall be scalable for an increasing number of requests.
  • Quality: With a recurring quality validation of workers and a pre-defined set of requirements a platform ensures the quality of results the crowd is providing.
  • Security: The confidentiality and secure storage of data are of high interest in developing AI-services and features as data is their core.
  • Cost: Outsourcing tasks to external crowdsourcing platforms must be profitable or must meet the price that you want to pay for the level of functionality these platforms are providing.

The above-mentioned list of criteria shall help to identify the crowdsourcing platform of choice for performing tasks. It is hard to generally decide which of the mentioned platforms is the best as this needs to be considered case by case.

AI-Labeling Crowdsourcing Platforms Pattern

The AI-Labeling Crowdsourcing Platform pattern solves the emerging business challenge of meeting the needs and efforts for developing AI-services. Especially for companies without a workforce that is dedicated to data labeling and validation services such platforms are the key to stay competitive. The main characteristics of the AI-Labeling Crowdsourcing Platform pattern from the perspective of a developing company are the following:

  1. Customer Relationship: AI-services improve customer relationships by offering highly personalized services and new services.
  2. Key Activities: AI-services go along with high efforts for labeling data to train AI-models and its recurring validation.
  3. Key Partner: Outsourcing the services for data labeling and model validation, the crowdsourcing platform provider will be an important partner to ensure AI-service operation.
  4. Key Resources: With the main effort being outsourced the key resource is the integration of crowdsourcing services into the AI-service based on an automated approach.
  5. Cost Structure: Crowdsourcing services are paid on a demand basis as micro-fees.

The above characteristics of the pattern are visualized on the business model canvas below according to Osterwalder and Pigneur ().

Image for post
Image for post

Scale and the Labeling of AI-Data

Amazon Mechanical Turk is most known as THE crowdsourcing platform provider which first entered the market of automating human intelligent tasks. It is part of Amazon’s Web Service offerings and is commonly used for text classification, transcriptions, surveys, and data labeling. Nevertheless, this article highlights the Scale platform as it is a simple and effective alternative to Amazon Mechanical Turk which strongly focuses on computer vision automation by providing managed labeling services.

The default use cases for Scale’s platform vary from retail, autonomous driving, robotics, drones, to augmented reality. The provided API interface allows companies to specify images, 3D point clouds, videos, texts, and whole documents to be labeled and therefore provides great flexibility on supported artifacts. After sending an artifact with the targeted data service e. g. extraction, classification, segmentation, transcript via an API call the request is reviewed for plausibility, processed from the crowd, validated according to statistical checks, and lastly returned. To provide insights into the pricing for outsourcing human tasks to Scale, the following equation shall be used exemplarily for the classification of a single image ():

$0.08 + $0.08 * number of requested classifications

This pricing model also provides one key difference to Amazon Mechanical Turk. While the pricing for Scale’s managed services is fixed, Amazon Mechanical Turk’s pricing is request-based which offers requesters to bid for higher prioritized processing. This may be an advantage or a disadvantage — at least for me Scale does provide a more transparent pricing approach.

Example: The Toyota Research Institute’s AI-Labeling for Autonomous Driving

Scale’s case study about the Toyota Research Institute provides an example of how the managed labeling services can be integrated into a business model for autonomous driving (). The mission of the Toyota Research Institute is to research autonomous driving to its full extend by taking all the responsibility for driving. The development of AI-services for autonomous driving goes along with large volumes of data. One the one hand, the machine learning team could not label the amount of data qualitatively and, on the other hand, the trade-off shall not negatively influence the quality of the labeling. With Scale, the Toyota Research Institute found a labeling provider that took care of the data annotation pipeline in a fully managed approach without the need to significantly increase the data engineering team. The Toyota Research Institute experienced great flexibility including labeling 2D and 3D data. With a fast-growing demand from the Toyota Research Institute, Scale even added custom simulation features and increased the labeling throughput by 10x.

Best Practices

The following best practices help to gain the most benefits from outsourcing data labeling tasks to crowd workers:

  • Worker qualification: Ensure to define the qualification of the workers that the crowdsourcing platform is addressing. As an example, if the AI-service targets a local audience the workers shall be from the targeted audience.
  • Pre-processing: Define similarity thresholds of data to reduce the number of data that needs to be processed by the crowd as this saves time and money.
  • Shadow-crowd: For risk mitigation shadow the crowdsourcing platform with an alternative provider so that you do not rely on a single crowdsourcing platform.
  • Own workers: Let your employees be part of the crowd as the knowledge of how the crowd process works and the internal quality assurance for labeling are beneficial.

Related Patterns

The AI-Labeling Crowdsourcing Platform pattern is best combined with the Leverage Customer Data pattern (). This pattern provides new values by collecting customer data or providing business insights that can be sold to the customer or even third parties. AI-services which improve the customer relationship can further be used to create new functionality or services users might be interested in. As an example, Spotify’s recommendation algorithm can be highlighted as the recommendation does not only provides the end-user a suggestion on new music tracks it rather is a customer behavior insight that third parties are interested in.

Conclusion and Final Thoughts

Artificial intelligence provides strategic advantages for back-office functionality and customer relationship. While companies like Google, Amazon, and Tesla already rely on artificial intelligence, other companies lack such technologies. Trying to catch up with AI-services, companies have the chance to leverage crowdsourcing platforms that efficiently take care of high-effort tasks such as data labeling and allow companies to focus on developing value-adding machine learning algorithms.

Scale offers fully managed data labeling services to build AI-applications. With its API, Scale makes it easy to integrate the managed services into other applications and helps to boost such developments. Besides standard labeling of data such as for images or videos, Scale offers the simulation of 3D point clouds which makes it easy e. g. for the original equipment manufacturer to foster autonomous driving effortless.

Further Readings

Consider the following readings for more information on digital business models and how to adopt AI for your business.

Business Models of a Digital Era ()
Do not miss this article about how digital transformation and digital natives are changing the business. With the adoption of emerging technologies and customer behaviors, companies show a variety of new business models patterns face the characteristics of a digital era.

Artificial Intelligence for Business: A Roadmap for Getting Started with AI (Link)
Artificial Intelligence for Business helps to understand how organizations can adopt AI-technology by providing business gaps and opportunities that can be met easily. Furthermore, the book provides insights on how to find critical data sets, build prototypes for mitigating risk and best practices for production-ready AI-systems which might include organizational adaption.

The Startup

Medium's largest active publication, followed by +708K people. Follow to join our community.

Marcel Boer

Written by

Enthusiast & Innovator | Keen to conceptualize, implement and test innovative business models

The Startup

Medium's largest active publication, followed by +708K people. Follow to join our community.

Marcel Boer

Written by

Enthusiast & Innovator | Keen to conceptualize, implement and test innovative business models

The Startup

Medium's largest active publication, followed by +708K people. Follow to join our community.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store