Applying Artificial Intelligence with Elegance: The Machine Learning Team at Aifred Health

Published in

Aifred Health

8 min readMay 5, 2018

When the buzz words “artificial intelligence” (AI) and “machine learning” (ML) arise in the popular media, they conjure up images of shiny white machines assembling IKEA furniture, robots with faces attempting (and arguably failing) to mimic human emotion, and computer screens displaying rapidly changing code in dark basements far from the touch of any human. However, developing effective ML algorithms in practice requires a great deal of blood, sweat, and tears from human engineers. Contrary to the stereotyped image of computer scientists coding away in isolation, the ML team at Aifred Health spends much of their time discussing their work with one another to refine their approaches to analyze datasets and working to improve the interpretability of their software. While these collaborative and communicative efforts are necessary for any ML application development, the field of mental health tech demands the translation of massive datasets collected by hard-working researchers in psychiatry into an understandable format for an AI model, and this necessitates a synergistic approach from the team.

At Aifred Health, we are using AI techniques to develop a clinical decision aid for physicians to improve treatment efficacy for individuals suffering from depression. The foundation of this clinical tool is the data painstakingly collected by researchers investigating patient response to depression treatments in clinical trials — and a colossal amount of this data has been amassed from institutions all around the world over the past few decades. The two most critical types of data for Aifred’s ML team are the baseline characteristics of patients in depression treatment trials, and the outcome measures at the end of the study. By analyzing the characteristics of different individuals pre-treatment, ML algorithms can use this information to attempt to predict treatment response, i.e. the outcome measure of the trial such as HAM-D (Hamilton Depression Rating Scale) or QIDS (Quick Inventory of Depression Symptomatology) scores. Ultimately, the goal is to develop a robust and flexible model which can use the datasets it has trained on to assess the baseline characteristics of a new, never-before-seen patient and create a prediction for which treatments will be most effective based on their biological profile. But for many who are unfamiliar with the world of AI and its innerworkings, the question is obvious: How does this happen?

With the ML team at Aifred Health, a shining example of predicting outcome based on personal characteristics is the success the team has had with predicting lifetime suicidal ideation using data from the Canadian Community Health Survey. Sneha Desai, a member of the ML team, has been working with this dataset of demographic, lifestyle, and employment information to predict if a person has had a suicidal thought in their lifetime. Impressively, if we do say so ourselves, Sneha has managed to predict lifetime suicidal ideation with greater than 69.8% sensitivity and 76.0% specificity, and a positive predictive value of 6.12% and a negative predictive value of 99.12% with her work. Part of this process involves identifying the most important features in the dataset which can be used to predict the outcome measure of interest — with the Canadian Community Health Survey data, Sneha has whittled the number of features used to predict lifetime suicidal ideation from 581 to 143. This feature reduction is achieved by selecting for specific features based on how well they correlate with the predictive power of the network. For those reading without an AI background, this would be analogous to having a group of friends all with the potential to give you advice on whether or not to invest in certain stocks. If most of your friends give you advice which never pays off, you’re probably going to stop listening to them. However, if you have one friend that seems to consistently predict what stocks are going to increase in value, you would start focusing your attention on that friend’s advice. The same idea holds with ML predictions: the goal is to remove redundant features to create a more parsimonious model with accurate predictive capabilities.

In order to achieve this type of parsimonious model, ML team members like Sneha can adjust the way datasets are modeled, or they can change the architecture of the ML model itself. At this point in the process, members of the ML team communicate their findings to the rest of the team and discuss the performance of a given network. The ML team uses a divide-and-conquer approach, where individuals on the team work closely with a specific dataset to understand its unique characteristics and then use this insight to find the best type of network to predict outcome measures with the highest accuracy.

This prototyping phase involves the use of the Vulcan framework, which was developed in-house to improve efficiency for testing different types of networks on datasets. Vulcan was created to allow the ML team to focus their efforts on discussing network structure on a more abstract level by giving team members the power to rapidly prototype and test different networks, and to assist in visualizing and characterizing data. Essentially, the framework gives deep insight into how well a network is performing. Importantly, this framework uses metrics which represent network performance to compare performance across different models, allowing an ML team member to select the best network for the job. Another advantage of using Vulcan is the built-in interpretability modules, which help ML team members understand why a model performs a certain way; for example, by identifying which features are most important for accurate predictions. Interpretability is a high priority when choosing the best network for a dataset, as computer scientists and AI developers must identify which components are most valuable in predicting outcome measures in order to further refine a network. “Vulcan’s interpretability modules give us a simple explanation as to why a network behaves the way it does. These modules allow us to pry open the network black box and see its inner workings,” says Robert Fratila, Chief Technology Officer at Aifred Health and Vulcan’s creator. Thus, armed with an interpretable read-out of network performance, the ML team can describe which specific features within a dataset helped to tease out relevant differential predictions. Once these features have been identified, this information can be relayed further along to the research and clinical teams at Aifred Health.

The research, clinical, and ML team work closely together to decide how to train networks and choose features from a clinical perspective. Our research and clinical teams assist with further reducing the number of features to analyze, clarifying the relevance of metrics within datasets, structuring data analysis, and interpreting findings from the ML team. This collaborative effort allows the clinical and research teams to advise the ML team with up-to-date information from the mental health research community, and ensures that findings produced by the ML team are interpretable from a clinical and research perspective. “We work back and forth with the research and clinical teams to determine if what we find with our analysis of these datasets is validating what people have already found in the research community, or if it’s something new,” says Fratila. “Keeping interpretability in mind is important for communicating our findings on the ML team not only to physicians who will be using this tool but also for creating a dialogue for research purposes.”

Ultimately, the goal is to have a model which can predict treatment outcomes based on an individual patient’s characteristics using patterns learned from the massive datasets on which it’s been trained. Essentially, this requires combining many different datasets in different formats and with different measures into a common pool of information which AI technology can use to predict something it’s never seen before. The way the ML team approaches this challenge is simple: tackle one dataset at a time. “You can build an intuition for a specific dataset,” says Fratila. “Then, once you do some initial tests on individual datasets and fully understand the parts of each dataset, you can see how they fit with each other and use that knowledge to combine them together. Just like you can have different computational modules in the brain, each specialized to look at a certain type of data, you can build a model with this structure — with different components looking at different modalities, and then feeding the information along a circuit to a higher level.”

However, with great aspirations come great obstacles. Often, the data collected in depression treatment research is noisy and may come in formats which are difficult to manipulate into an easy-to-use form. This incompatibility is understandable when one considers that this type of data was not collected for the purposes of being fed into an ML algorithm to help guide treatment selection, as these technologies have only been recently developed. However, with rocketing AI advancements in recent years, the need for a standardized presentation format for this type of data has grown. While presenting data in variable formats is fine for describing results within psychiatric communities, AI networks require data to be in a more organized and strict arrangement. Thus, a lot of time is spent shaping the data into a friendly format for ML team members. “There’s a running joke about what ML engineers spend their time doing,” says Fratila. “10% logistics of network architecture and training, 10% discussion for hyperparameter optimization, and 80% data pre-processing.” So, when collecting data which can be used for predictive purposes in this field, efforts to ensure compatible data structures and good documentation will minimize time spent pre-processing data and reduce the energy aimed at this initial task.

Another challenge for the team comes from the unique qualities of ML, which make it difficult to produce estimates of necessary sample sizes for training networks. Normally, traditional statistical methods allow researchers to estimate minimum sample sizes needed to obtain significant and valid results but with ML, it’s a trickier process to gauge the amount of data a network will require for training purposes to make accurate predictions. These difficulties in estimating required sample sizes can potentially make it hard to convince regulatory bodies and data collaborators that the ML process demands a try-first approach to gain insight into whether more data is needed. However, at Aifred Health we’ve managed to collect 26 000 patient records and are expecting to have 35 000 patient records soon. We continue to acquire valuable data in depression treatment research thanks to the hard work of the Scientific Partnership team led by Sonia Israel.

Despite these challenges, the ML team has produced good preliminary results from their work with data from the STAR*D trial, the largest study evaluating depression treatment to date, and is continuing to improve their prediction accuracy for treatment response using 2 000 patient records from this dataset.

The effective dynamic of the ML team depends on fluid roles for each of its members — while individual team members focus on specific datasets, discussion between team members is critical for moving forward when deciding what changes to make to optimize network performance. It’s an interesting contrast to the idea of an ML engineer working in isolation — the success of the ML team depends on collaboration and human interpretation of network performance. “We value hearing everyone’s opinion during the process of optimizing a network, and there’s a lot of learning and discussion going on,” says Fratila. “It’s important to be open-minded to change and ready to adapt and integrate new ideas.”

There’s an art to manipulating data into ML-friendly formats and tinkering with network architecture, and there’s an art to helping non-computer scientists navigate the twists and turns of AI technology through interpretable communication. Thankfully for the rest of the Aifred Health team, and the rest of the world, our ML team is skilled in both respects. While any member of the team is capable of valuable work on their own, it is in combination that the ML team makes its greatest impact in improving treatment efficacy for people with depression.

Applying Artificial Intelligence with Elegance: The Machine Learning Team at Aifred Health

Written by Aifred Health