A Faster Way to Annotate Transcript Data in PTSD Therapy Sessions

Learnings to improve the process of annotating data.

Albert Lai
Aug 31, 2019 · 7 min read

Deep learning is worth the hype. You can generate faces indistinguishable from real ones and train models to detect fake news better than humans can. That’s what drew me in to learn computer vision and natural language processing several months ago.

Image for post
Image for post
Denoising images with a U-net: check out my article on this here!

Now fast-forward a couple of months to last June. I was browsing through Linkedin, and came across an interesting link that led me to this:

Image for post
Image for post

I decided to apply, and lo and behold: I was accepted! I was super excited to join an awesome team of 40 other researchers and enthusiasts ready to make a change in the world with AI in only 8 weeks! And so we jumped right in :)

What’s the Challenge?

Image for post
Image for post

In the kickoff call with Christoph von Toggenburg, he talked about his exposure to Post Traumatic Stress Disorder. PTSD can be triggered when someone experiences a severe traumatic event, and instead of the trauma leveling off, it becomes a mental health condition.

Symptoms include panic attacks, anxiety, uncontrollable thoughts, and more, which can be triggered whenever they are reminded of the event.

“The difference between trauma and PTSD is that switch in your brain, and it becomes a part of your life. It is something you cannot reverse, but you can deal with the symptoms, and if treated properly, you can get much better” — Christoph

Over the last 20 years, Christoph has been a part of humanitarian work with the UN and Red Cross. He often traveled to war-torn places to help the refugees and civilians. He’s seen thousands of victims who have been through traumatic experiences and have been unfortunately unable to receive help due to inaccessibility for therapy services.

Christoph has also experienced PTSD himself; a truck he was riding in while on an emergency mission in the Central African Republic was ambushed. He received treatment for it and experiences almost no symptoms now.

Now, Christoph is starting BEATrauma, an initiative to help victims with PTSD all around the world. His vision is to create a mobile app chatbot to converse with users and determine a risk assessment for PTSD, which would implement machine learning — that’s where we come in!

Omdena — Learning ML Through Collaboration!

Omdena is a global platform where AI engineers and enthusiasts from diverse backgrounds collaborate solve real-world social problems and build a meaningful career.

As a part of a group of 40 other enthusiasts, experienced developers, and mentors from around the world, we were all moved by Christoph’s story. We wanted to make a change and do good with AI. As a team, we began the initial phase of researching deeper into PTSD and different methods for therapy. And boy, were we motivated!

Image for post
Image for post

We discovered that CBT (cognitive behavioral therapy) was the best solution. CBT is having a therapist to talk to the patient more about their experiences and “expose” them more until they finally become comfortable with it. Knowing that we could implement a conversational agent in NLP for this purpose, we set our sights on training data.

The Data Problems— Not Annotated, Not Enough

Data is not always easy to find, especially when dealing with sensitive user information like therapy sessions. Our in-house math and data science professor Colton Magnant was able to get his hands on around 1700 transcripts on therapy sessions, about only 50 which were for PTSD.

Image for post
Image for post
YAY DATA!!!!

From there, we split into 2 groups. One was in charge of risk assessment, creating a rule-based algorithm in rasa with sentiment analysis to converse with the user, along with a backend classification model trained on transcript data to determine if the user had PTSD. The other focused on CBT, training a seq-to-seq chatbot for therapy!

I decided to take a step back from NLP and focus on data annotation. Since the transcripts came completely unlabelled, we had to give them a score between 0 to 1 so that the model could learn which patients had PTSD and which didn’t. Luckily, Alexis Carrillo Ramirez, who has experience with statistics and psychology, was able to guide our team of 7 through reading through the transcripts and scoring them!

The Annotation Process

  1. Understand each of the 6 criteria for PTSD. E.x., Exposure to actual or threatened death, serious injury, or sexual violence, Persistent avoidance of stimuli associated with the traumatic event(s), and more!
  2. Keeping the criteria in mind, read an entire transcript (which can take from 45 min-1 hr).
  3. Score each of the 6 criteria with either a 0, 0.5, or 1, of which 0 means not displaying the symptom at all, 0.5 meaning somewhat displaying it, and 1 representing a clear expression of that symptom.
  4. Follow a formula to take in all 6 numbers and spit out a number between 0 and 1 for the risk assessment for PTSD.
  5. Rinse and repeat for the other 49.
Image for post
Image for post
Criterion A’s description

We faced two problems in our annotation process. The first was that it took far too long to annotate all the data. Through complications and busyness, it took around 2 weeks to finish with tons of hard work put in. The second was that the transcripts were often a bit unclear and difficult to understand.

We brainstormed several solutions to the annotation problem:

  • Determine a bag of words and their embeddings for each criterion and run LDA (Latent Dirichlet Allocation) on top of them for classification of each criterion to completely automate the process
  • Using USE (Universal Sentence Encoder) to determine the cosine similarity of each sentence to match sentences of the same criterion
  • Use GPT-2 to summarize each transcript to get the main idea, speeding up the annotations

Creating the Risk Assessment Chatbot

From there, we had to create a classification model that takes in user conversations and determine if they had PTSD. However, we didn’t have enough data to train our model to make it robust enough. Luckily, from a breakthrough with ULMFiT’s transfer learning technique, we have been able to achieve close to 80% accuracy so far, with more improvements to come!

Image for post
Image for post
Ready to run the advanced models soon!

I Have Learned So Much From This Experience!

When I first joined Omdena, I just understood data from a machine learning perspective. I didn’t know about data engineering or annotations or the tremendous work it would take to clean the data. Back then, I was just grabbing nearly perfectly manicured Kaggle datasets!

Image for post
Image for post
Gotta love Kaggle :)

Now, I’ve realized that’s not how it works in the real world. Genuine data is messy, difficult to understand, and doesn’t come with documentation. From this challenge, I’ve learned so much about working with data and how to better understand it! Now we’re discussing working on a paper to show our findings and results for data annotation for therapy sessions to the world, which is very exciting :)

I’ve also learned that things don’t always turn out as planned. It’s quite easy to follow a data science course or tutorial and have it work exactly as you’d imagine. However, through working on this tremendous undertaking, I’ve realized that there are always hiccups along the way. We’ve had issues with data, model accuracy, and had to scrap our ideas for CBT due to the culmination of them.

Nevertheless, we have still accomplished a ton and we’re almost ready to push out our risk assessment chatbot for BEATrauma! We’re excited and honored to make an impact in the world and I’m proud to be a part of this Omdena challenge!

If you want to join one of their challenges and make an impact, apply here.

Image for post
Image for post

If you want to receive updates on our AI Challenges, get expert interviews, and practical tips to boost your AI skills, subscribe to our monthly newsletter.

We are also on LinkedIn, Instagram, Facebook, and Twitter.

Omdena

Building Real-World AI Solutions Collaboratively

Albert Lai

Written by

I’m a 17-year-old student who loves technology and life, and trying to get better at both!

Omdena

Omdena

Omdena is a collaborative platform for building AI solutions to real-world problems through the power of bottom up collaboration.

Albert Lai

Written by

I’m a 17-year-old student who loves technology and life, and trying to get better at both!

Omdena

Omdena

Omdena is a collaborative platform for building AI solutions to real-world problems through the power of bottom up collaboration.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store