12 questions to ask before starting an AI project

12 questions to ask yourself before starting an AI project

Any journey starts with checking if you have everything you might need during the trip — even if it’s a journey of building an artificial intelligence‒powered solution.

Below, we outline twelve key questions worth asking before you start doing an AI project:

Let’s take a closer look at each consideration.

Preliminary evaluation

First and foremost, you need to validate if artificial intelligence is the right choice for your project or for a particular task.

1. Is AI applicable for this case?

You need to clearly determine your objectives for the task at hand and see if they can be achieved with and without the help of AI. There are two possible scenarios where the use of AI won’t be reasonable:

  • The desired results can be achieved without the use of AI — In this case, using AI for the sake of AI wouldn’t be a smart choice. Instead, you should look for other cases where the adoption of artificial intelligence can bring significant benefits to your business.
  • Existing AI technologies can’t solve the task at hand — Investing time and money into researching revolutionary AI solutions is risky and doesn’t guarantee positive results. So for the sake of business sustainability, it’s best to only apply known AI capabilities for solving specific tasks and leave scientific experiments to researchers.

Meanwhile, the golden case for starting an AI project is when AI either is the only technology you can use to solve the task at hand or it works faster or at a lower cost than any non-AI alternatives.

2. What are the implementation options?

Next, you need to evaluate your available implementation options. The most common scenarios are:

  • Deployment of ready-to-use AI algorithms or models — You can use this approach when ready solutions can solve your task at hand without any customization of the algorithm or model. If the task is too complex to be solved with one ready solution, you might also try splitting it into several simpler ones and look for fitting algorithms for each of these subtasks.
  • Customization of a ready solution — More often than not, a ready dataset or pretrained model won’t fit your project goals out of the box. In this case, you will need to run some adjustments before using ready AI solutions or components, such as improving data labeling in a dataset, expanding the dataset with new data, or training a ready model on custom data.
  • Development of a custom AI solution — This scenario is rather challenging and expensive and usually arises when you work on complex, unprecedented use cases. However, even the most unique and complex AI projects usually can be split into several smaller tasks. And among those smaller tasks, only a few would actually require building a custom AI from scratch.

You might try choosing one scenario that can bring you the best results at a lower price, or combine these approaches to implement different parts of your AI project.

3. What are the legal limitations of using AI?

Last but not least, make sure there are no legal or compliance restrictions that prevent you from using AI in your project.

Here, we can outline several key aspects to pay attention to:

  • Data privacy — As AI solutions often operate with real-life data, your job is to ensure that data use is consensual and that all personal information is properly anonymized.
  • Data security — For data that can’t be anonymized, you should establish additional security measures. Your development team needs to make sure that no one can access, process, or alter your AI’s sensitive data without proper authorization.
  • AI explainability — In many countries, there are specific requirements as to the quality and explainability of AI solutions. For some industries, like finance or healthcare, the requirements might be harsher than for others. This is all done to ensure that black-boxed AI solutions that lack transparency in their decision-making processes aren’t prone to bias. Requirements to consider include the Algorithmic Accountability Act of 2019, Ethics Guidelines for Trustworthy AI, and the General Data Protection Regulation.
  • Terms of use — When you deploy a ready AI component or solution, pay special attention to the terms and conditions regulating its use. Some open-source AI components and models might have restrictions regarding the field of use or territory in which they are used, ethical limitations, etc.

Make sure there are no legal limitations to using AI technologies in your project, and design your solution with data security and privacy in mind.

Financial risks

Once you determine there are no legal restrictions on using AI, make sure to assess possible financial risks of your project.

4. Is the potential profit from AI higher than its implementation costs?

There are many factors that can affect the cost of AI functionality:

  • Software requirements — The final purpose, complexity, and performance requirements of the software under development have a direct influence on the choice of data, technology, and skills for the project and, therefore, the project’s budget.
  • Type of data used — Structured data is usually less expensive to work with than unstructured data.
  • AI algorithm performance — Building an AI algorithm with a high level of accuracy and performance usually requires running several rounds of training and tuning, which leads to increased expenses.

But the real challenge is to make sure that the initial cost of implementing AI doesn’t exceed its potential return on investment (ROI). This is especially true for projects requiring the development of an AI solution from scratch. Conduct deep research of the solution’s implementability to determine if it can be developed and at what cost.

5. Are there any ready solutions you can use?

As we discussed above, usually there’s a ready solution you can use in your project — either out of the box or with some amount of customization. At Apriorit, we have wide experience enhancing and speeding up our AI projects with the help of ready-to-use datasets and AI models. In one of our projects, we used the Inception V3 convolutional neural network (CNN) model pre-trained on the ImageNet dataset to detect particular actions in video sequences. You can read more about this project in our Using Modified Inception V3 CNN for Video Processing and Video Classification and Applying Long Short-Term Memory for Video Classification blog posts. And to learn more about working with images, check out our article that explains possibilities for image recognition using AI.

In some cases, the use of ready solutions can help you cut costs and speed up product development. However, customizing a ready solution also might be expensive, so you need to keep this risk in mind as well.

6. What non-AI elements are also necessary?

Some experts stress that non-AI elements can turn out to be even more expensive than the AI itself, at least when you estimate the cost of an AI with the cost of the specialists whose skills you need to build it. For instance, up to 70% of a project’s budget might go not to the AI functionality itself but to the arrangement of proper data storage and management.

The most important non-AI elements you need to account for when planning your budget include:

  • AI project infrastructure — Data storage and management, networking, orchestration and pipelining systems
  • Data protection measures — Data security-oriented architecture, data access management tools
  • API development — To increase deployment capabilities of your solution, it’s important to build a properly secured and well-performing API for it.

Required resources

Skills, technologies, equipment, and data are the pillars of any AI solution. A lack of any of these four things can compromise the entire project. Therefore, you need to take care about them in advance, starting with answering the following questions:

7. What skills and talents do you need for this project?

The range of skills and competencies needed to deliver a sufficient AI solution depends heavily on the project’s objectives. However, the core set of specialists usually includes:

  • Business analyst to accurately determine business needs and customer demands relevant to your project
  • AI engineers to build algorithms and AI models
  • Data engineers to build reliable and secure data pipelines
  • Software developers to build non-AI solution components
  • DevOps engineers to build a stable infrastructure and ensure smooth integration of your AI models
  • Quality assurance specialists to enable continuous testing and improvement of your AI solution

The range of required specialists and expertise might change as your project evolves. You might also need non-AI consultants to ensure that your product fully meets end users’ expectations.

8. What technologies and equipment are necessary?

Technical viability is a crucial part of any AI project. Your AI development team should examine what technologies, equipment, and data are needed to make sure the project is feasible.

Core things to check before starting development include:

  • Readiness of AI algorithms
  • Availability of pretrained AI models
  • Availability of datasets with structured and unstructured data
  • Access to necessary AI hardware

If not taken care of in advance, the lack of needed equipment and technology might compromise the development and launch of your product.

For instance, the power of your hardware can be increased with the help of additional tools. You can learn more about ways we’ve used Google Colaboratory for this task in the article below.

9. What are your data requirements?

To better understand your data needs, you can use the data science hierarchy of needs, first offered by Monica Rogati. As with Maslow’s classic hierarchy of needs, you need to start from the very bottom and slowly move upwards, meeting each of the data needs of your AI project.

If you try starting from the top of this pyramid, you might find yourself in a situation where your AI turns out to be malperforming, biased, and inefficient due to a lack of quality data.

How do you assess data quality?

There are many ways you can determine whether the data you’re about to use for an AI project is of good quality. We suggest paying most attention to the following four criteria:

1) Sufficient quantity

Evaluate what sources of data you have, how much data you can get from them, and of what quality. Your goal is to get enough data to train, validate, and test your model. It’s a good practice to split your dataset into three parts before you start creating an AI project so that training and testing data won’t get mixed, for instance.

If you don’t have enough data to feed your AI model, consider multiplying the data you already have with augmentation techniques. These techniques allow for expanding AI model training datasets by making minor changes to existing data. We relied on this approach when we needed to improve the accuracy of an AI model to classify types of skin cancer.

2) Trustworthy sources

The origins of your data have a direct impact on its quality and, therefore, on the performance of your AI solution. It’s important to aim for the most relevant resources, like patient records for healthcare solutions or agricultural satellite imagery for agricultural projects.

You can leverage data coming from either primary or secondary sources.

Primary sources are those originating from your own organization, such as customer relationship management or enterprise resource management platform records.

Secondary sources include your business partners and third parties, commercial databases, and publically available datasets. Data from these sources serves best for enhancing the quality of data from your primary sources.

3) Proper labeling

Artificial intelligence systems study data through labels and tags. A model trained on properly labeled data can learn to detect the same patterns in unstructured data with no tags. And the more precise the labeling of training data, the higher the accuracy of your algorithms and models.

Data labeling is a must for AI projects leveraging data from primary sources. Your AI team might also need to improve data labeling of ready datasets to make sure that tags fit the objectives of your algorithms. Data can be labeled either manually or with the help of dedicated software.

When building an AI system that can automatically detect, segment, and measure follicles in ultrasound images, we engaged therapists on the client’s side to ensure the high accuracy of data labeling and compose a dataset with high precision and excellent quality. In cases like this, data should be processed and checked manually to ensure maximum accuracy of assigned labels.

In our Action Detection Using Deep Neural Networks: Problems and Solutions article, you can learn more about our experience working with annotation tools to label data for our custom AI training dataset. Also, check out our article about using deep learning in automotive industry.

4) Free of bias

Machine learning solutions can generate inaccurate and unfair outcomes if they are biased. And data is one of the three main sources of bias in AI, along with algorithms and AI goals.

It’s difficult to keep your data 100% free of bias. But you need to put effort into keeping the level of bias in your AI model as low as possible.

Data bias can be either statistical or cognitive. The first comes from the use of irrelevant and unrepresentative data that has little to do with the data your solution will work with after release. Cognitive bias usually makes its way into your software at the stages of data selection or labeling, reflecting the misconceptions and prejudices common for a particular society. As a result, you get an unbalanced dataset where certain features are overrepresented or underrepresented.

Further maintenance

Successfully releasing an AI solution is only half of the deal, as you need to continuously maintain your AI model’s proper performance to keep your customers satisfied in the long run. And as support and maintenance also require significant resources, you need to plan for this activity long before the actual development of your AI solution begins.

Here are the few aspects we recommend paying attention to:

10. What should you do after the release?

You need to make the monitoring and maintenance of your AI solution a continuous process. Only then can you ensure proper performance, strong data protection, and a high level of customer satisfaction.

Key post-release activities include:

  • Performance monitoring to quickly detect and fix minor issues
  • Continuous quality control to prevent the performance of your AI model from degrading
  • Regular retraining on new datasets to prevent model drift

Aside from these activities, it’s important to pay special attention to the level of bias and the accuracy of your AI model.

11. How can you protect your solutions from introducing new biases?

As AI models often get retrained with new data after the initial release, it’s important to keep watching for biases in them. From the technology point of view, you can address the problem of bias on two levels:

  • Algorithms — Follow industry and government recommendations to increase the transparency and fairness of your algorithms. Cross-check them against different datasets to detect possible distortions.
  • Data — Increase data quality and diversity in your datasets and test your models using data that’s different from the data they were trained on.

It’s easier to prevent and fix biases in solutions based on explainable AI algorithms. But the accuracy of black-boxed models can also be checked and improved.

12. How can you ensure the continuous accuracy of your AI solution?

As with any traditional software, it’s crucial to ensure that your AI solution can perform smoothly under different loads and that all valuable data is properly secured. But what’s even more important is to continuously check and improve the accuracy of your model.

We’ve already discussed that some AI solutions are black-boxed, meaning that it’s difficult or impossible to explain how a model came to a particular conclusion. However, there are several metrics you can use to ensure that even black-boxed models work as intended:

  • Classification accuracy, which compares the number of correct predictions to the overall number of predictions made by your model
  • Mean absolute error, which evaluates the difference between the prediction and the true value of a particular observation
  • Logarithmic loss or log loss, which evaluates the accuracy of a data classifier
  • And more

Depending on the specifics of your AI project, you can use different metrics and approaches to evaluate and improve the accuracy of your algorithms and models.

Read more about starting a new AI project in the full article on our blog.

--

--

Apriorit
Apriorit — Specialized Software Development Company

21+ yrs of expert software engineering services to tech companies worldwide, covering the entire software R&D cycle. Details: www.apriorit.com/about-us/company