To Be AI-First you Need to be Data-First

Ron Schmelzer
Oct 16, 2018 · 3 min read

This post was featured in our Cognilytica Newsletter, with additional details. Didn’t get the newsletter? Sign up here

One of the core things we focus on in our Cognilytica AI & Machine Learning training and certification is that machine learning projects are not application development projects. Much of the value of machine learning projects rest in the models, training data, and configuration information that guides how the model is applied to the specific machine learning problem. The application code is mostly a means to implement the machine learning algorithms and “operationalize” the machine learning model in a production environment. That’s not to say that application code is not necessary — after all, the computer needs some way to execute the machine learning actions — but focusing a machine learning project on the application code is missing the big picture. If you want to be AI-first for your project, you need to have a data-first perspective.

Use data-centric methodologies

As we discussed in our previous article on AI methodologies, if you’re going to have a data-first perspective, you need to use a data-first methodology. There’s certainly nothing wrong with Agile methodologies as a way of iterating towards success, but agile on its own leaves much to be desired as it’s focused on functionality and delivery of application logic. In our previous article we outlined a data-centric Agile methodology approach that merges the CRISP-DM methodology with agile to bring the best of both worlds together. While this is still a new area for most enterprises implementing AI projects, we see this sort of merged methodology approach to be more successful than trying to shoe horn all the aspects of an AI project into existing application-focused Agile methodologies.

Digging a bit deeper, it makes sense to look at what the specific artifacts of the AI project need to be to have the most success. After all, what we’re delivering with an AI project is not functionality, but data. So, what are those different data artifacts?

  • Business Understanding Artifacts
  • Business Background
  • Business Objectives
  • Business Success Criteria / KPIs
  • Cost / Benefit Analysis
  • Resource Inventory
  • Initial Project Plan
  • Resource Allocation
  • Tool Selection Criteria
  • Data Understanding Artifacts
  • Data source identification
  • Data collection report
  • Data description
  • Data quality analysis
  • Data cleansing requirements
  • Data Preparation Artifacts
  • Data set description
  • Data selection rationale
  • Data cleansing reports
  • Derived attributes and generated records
  • Merged Data
  • Reformatted Data
  • Data Modeling Artifacts
  • Algorithm selection approach
  • Modeling technique
  • Modeling assumptions and hyperparameter configurations
  • Training set selection and training method
  • Test set selection and test method selection
  • Generated models
  • Model assessment & validation
  • Hyperparameter revisions
  • Model Evaluation Artifacts
  • Evaluation of model performance
  • Alignment of model results with business requirements and KPIs
  • Review of process
  • Operationalization requirements
  • Next iterations of model and artifacts
  • Deployment Artifacts
  • Deployment code development
  • Deployment plan
  • Monitoring and Maintenance plan
  • Alignment of deployment with business objectives

Then we need to look at what are the AI-specific activities we need to do to create those artifacts. Some of those activities are things that a data science role would do, while others (maybe even most) are data engineering activities. Still others are functions of business analyst and data analyst roles. At a high level, those activities and roles include:

  • Business Strategy development: Business analyst, solution architect, Line of Business (LoB), Data Scientist
  • Dataset Preparation & Pre-Processing: Data analyst, Data Engineer, Data Scientists, Domain specialists, External Contributors, Third-Parties
  • Dataset Splitting: Primarily data scientists with some data engineer involvement
  • Algorithm Selection, Model & Ensemble Development: Data scientists
  • Model Training: Data scientists
  • Model Evaluation & Testing: Data scientists
  • Model Deployment with Governance Framework: Data Engineer, Systems Engineers, Data Team, Cloud team
  • Business / KPI evaluation: Business analyst, solution architect, Line of Business (LoB), Data Scientist
  • Model iteration: Data analyst, Data Engineer, Data Scientists

As you can see, while agile methodologies are applicable here, they need significant modification to be used in an AI context.

Use data-centric technologies



Real-world insight, expertise, and opinions on Artificial Intelligence (AI) and related areas

Ron Schmelzer

Written by

Senior Analyst, Cognilytica — Founder TechBreakfast, Bizelo, ZapThink, Zoptopz, Channelwave, and more. Sometimes successful entrepreneur and knowledgeable guy.


Real-world insight, expertise, and opinions on Artificial Intelligence (AI) and related areas