Guest Post by Trisha Sengupta (Stanford AI4ALL ‘17), Viansa Schmulbach (Stanford AI4ALL ‘17), and Esther Cao (Stanford AI4ALL ‘17)
AI4ALL Editor’s Note: Trisha, Viansa, and Esther all participated in AI4ALL’s first AI Research Project Fellowship program, where they were paired with a mentor who works in AI to collaborate on an AI research project over the course of 3 months. In this blog post, they explain their project, which uses natural language processing to prioritize emergency dispatch calls. Learn more about AI4ALL’s Research Project Fellowship Program and other mentee projects here.
On January 13, 2017, Anna Fong, a mother of six and grandmother of seventeen, collapsed outside of a jewelry store in San Francisco’s Chinatown. Although an ambulance was immediately called, Anna ended up passing away after waiting for 21 minutes for the ambulance to arrive. This is more than double the city’s targeted waiting time of ten minutes. These extra eleven minutes marked the difference between life and death.
This is not the first time such an issue has occurred. In 2014, various news outlets began reporting on the ambulance shortage in San Francisco. Because of the frequency of calls and the lack of ambulances, individuals facing life-threatening problems had to wait upwards of hours when an ambulance should have been there in minutes. Large volumes of simultaneous calls make it difficult for the operators to prioritize needed services, as resources are limited. Automation can help come up with a route based on urgency.
Viansa Schmulbach — one of our group members and one of Anna’s grandchildren — was deeply affected by what happened to her grandmother. In addition to starting a foundation to increase access to emergency healthcare for San Francisco residents, she joined with fellow high school sophomores and Stanford AI4ALL alums Esther Cao and Trisha Sengupta to tackle this problem using AI. Together, we have created a program which uses natural language processing (NLP) to prioritize dispatch calls.
Our AI went through the following steps:
- It used the Medical Priority Dispatch System (MPDS) to create “keywords” and a corresponding rank of importance with those keywords.
- It matched the ambulance call with the closest keyword and thus corresponding rank.
- It placed coordinate points of the call on a color-coded map.
Initially, we planned to use call transcripts as the input, as this dataset would be more descriptive. However, due to HIPAA and privacy laws, the datasets inputted were in Comma Separated Values (CSV) format and were non-descriptive. The data contained features including longitude, latitude, description, and address. Unfortunately, the format of the datasets did not have a lot of natural language to process.
To solve this issue, we decided to implement an application called word2vec, a set of algorithms that maps a word as a vector, and calculates the cosine similarities between two words. If two words tend to appear together more frequently, the cosine similarity becomes higher. For example, model.similarity(“Russia”, “Soviet”) would be greater than model.similarity(“Russia”, “bunny”).
We imported the word2vec API from Google News. With the word2vec, we cycled through each word in the ambulance call and compared it to, or found the cosine distance between, each word in the dataset of keywords (which were not single words, but a list of phrases). For example, we compared the call “he is having trouble breathing” with the keywords [“aortic aneurysm”, “cardiac arrest”, “respiratory distress”]. We then found the average cosine similarity for each keyword and selected the keywords with the highest average; in this case, the keyword with the highest average would hopefully be “respiratory distress,” which would result in the call being assigned the urgency rank of this keyword.
With word2vec, our initial issue was that normal ambulance calls used extraneous words. For example in the phrase “he is having a heart attack,” the words “a” and “he” would sway the cosine averages. For this reason, we used nltk (a natural language toolkit) in order to extract nouns and verbs and exclude pronouns. We learned that word2vec is case sensitive, so we used the .lower() function.
Our dataset for keywords came in lists of phrases, not individual words, so we tokenized the list of keywords in order to cycle through every word of every phrase. Before that, we checked to make sure that the item in the list of keywords was a word, and not a symbol (> or <) with the .isalpha() function. Once we finally tested the data, it turned out the cosine averages were too close to accurately guess the words. From there, we checked if there was a union between two sets, and automatically gave the ambulance call that ranking if it did.
We used Medical Priority Dispatch Codes (MPDS) as the baseline for the keywords along with the word2vec program. MPDS is a system already used to sort out ambulance calls and prioritizes codes using the letters A-E where A is the lowest priority and E is the highest. Once this baseline was established, we coded a prioritization algorithm and used it prioritize the calls. Each CSV was looped through and matched with keywords found in MPDS codes and given a priority. Once priorities were assigned to each ambulance call, our final step was visualizing. The API we used is called gmplot and it is a simple method of visualizing data points on Google Maps. Each priority was color coded and then added to the map. After a few attempts, we were able to create a map with all of the calls placed and color coded according to their latitude and longitudes.
We believe this project has the potential to make a real impact.
In the future, we are planning to refine the code (all of which can be found on GitHub) and continue our project. We hope to refine our algorithm using word2vec by feeding it more data. Another goal is to make the map more interactive by plotting a route for the ambulance calls that optimizes factors such as drive time, arrival time, and wait time. We want to make the map more responsive by taking in an actual call transcript and processing it in real time. We are also planning to speak about our project at various technology conferences.
Trisha Sengupta is a rising junior at Lynbrook High School in San Jose, California and 2017 Stanford AI4ALL alumna. Aside from computer science, Trisha has an avid interest in various STEM fields, including biochemistry and biomedical engineering. She won second place at the Synopsis Science Fair in the Biological Sciences and Engineering Section. She is also passionate about the use of AI to solve medical issues. Other than STEM, she is also a part of her high school’s Model UN club and part of their band and swim team. She has also played bassoon for five years and is an alumna of the California State Summer School for the Arts.
Viansa Schmulbach is a rising junior and a Stanford AI4ALL 2017 alum in Northern California. She created a foundation called the Anna Fong Foundation to raise awareness about the ambulance shortage in San Francisco and to increase access to emergency healthcare for San Francisco residents; you can make a donation at their website or by emailing Viansa. Viansa’s main passions include participating in FIRST robotics as the vice captain of her team, studying medicine, and playing the ukulele. She is currently working on an initiative to bring STEM to underprivileged families and schools.
Esther Cao is a rising junior and a Stanford AI4ALL 2017 alum in Northern California.