GRAMMY Debates With Watson (Part 2)

From the lab to music’s greatest stage

Published in

IBM Data Science in Practice

9 min readMar 12, 2021

By Aaron Baughman, Tony Johnson, Elad Venezian, Yoav Katz

The journey of adapting Project Debater from IBM Research to the GRAMMYs created a resilient and component-based system that can summarize opinions on music topics into fluent narrations.

If you haven’t already, check out Part 1 for an overview of Grammy Debates with Watson and description of first phase of collecting arguments from Twitter.

In this article we will discuss how the additional arguments are collected from the web and how the combined Twitter and web arguments were summarized and key points highlighted

After we complete the Twitter argument mining step, we then proceed to speech generation.

Phase 2: Collecting Arguments from the Web

Close up of cloud components for this project. One one close up of fine tuned debater services: Ingress Node Balancer, Debater Key Point, Debater Topic Argument Quality/Pro&Con. The Speech by Crow has an arrow to Debater Generator, which has arrows to Cloudant NLP Store and IBM Cloud Object Storage. Other pieces include Debater pieces — API and Enrollment interfaces, Kafka, and IBM CDN — Figure 5. Speech synthesis components of the architecture

As part of the GRAMMY experience, music fans are invited to join the debates on a dedicated submission page. After the submission, users can see how their submission compares to other submissions in terms of argument quality and polarity. They can also see the summary of all of the arguments collected in the previous days.

Arguments are manually reviewed for inappropriate content (such as offensive or hateful comments), and spam is filtered in a dedicated admin user interface.

Managing scale in web submission

The lower half of the architecture is focused on the consumer-facing applications and the general speech synthesis pipeline. Consumers will input arguments into the client-facing application. The arguments must be between 8–36 words long. When the length restrictions are passed, the React front end posts argument to a scaled-out messenger application. This app enrolls the argument onto a 10 partition Kafka topic. Asynchronously, the debater enroll interface Python app on IBM Cloud has 10 threads each listening on a Kafka topic partition. The app pulls in the argument, checks for infringement, and then posts to the Speech by Crowd platform.

In parallel, the submitted user argument travels through the global throttling solution on IBM Cloud Internet Services to the Debater API interface. IBM Cloud Internet Services has several edge functions that are mapped to subdomains. Every request to the system flows through the functions. The logic within the function determines a traffic drop percentage to support the target requests per second. Any request above the target receives a 429 response. The remaining traffic flows to three IBM Research API endpoints. The system provides immediate pro/con, quality, and key point matching responses on the singular argument. The pro/con and quality services run on CPU-based clusters while the key point analysis endpoint runs on GPU-based clusters. Each response is aggregated together and sent back to the GRAMMYs experience.

Each endpoint within the Debater API Interface writes a count record to Redis on IBM Cloud. A throttle controller Python app reads the data from Redis and determines a required drop percentage of traffic to stay under the maximum requests per second. The results are written to IBM Cloud Object Storage. The IBM Cloud Internet Services edge functions pull the drop percentage from IBM Cloud Object Storage to determine which requests to return a 429. This process shields the compute-intensive algorithms.

Phase 3: Speech Synthesis

Every day at 5 a.m. ET, batch jobs take all of the arguments from Twitter and the website input and invoke the Speech by Crowd Debater pipeline on each topic. The Debater Generator application loops through all of the open topics and pulls the argument set from the Speech by Crowd argument database. Any argument that is marked as spam or inappropriate is not used within the speech generation process. Each of the arguments is assigned to either a pro or con list using the Project Debater polarity classification model. The argument list is supplied as a parameter along with the topic to the speech generation process. The speech generation process can take anywhere from 20 minutes to a few hours depending on how many arguments we are using. The resulting speeches, key points, arguments, language statistics, and grid points are then converted to a JSON form and stored in Cloudant. Any artifacts that have been approved through the Debater Review Tool are then pushed to IBM Cloud Object Storage, which is the origin for our Content Delivery Network. All of the batched data is served through this acceleration tier.

Now let’s examine how the speech generation process works.

Debater Pipeline

The Speech by Crowd platform performs several steps to generate speeches. First, the polarity of each argument is classified by a deep learning neural network. Next, we increase the spread of the raw polarity by taking the square root of the decimal. This helps us to further remove neutral arguments. The system then removes any irrelevant input text that is not aligned to the topic and has low quality. Next, the system takes the remaining arguments and begins the key point analysis process. The system selects approved, short, and high-quality sentences as potential key points based on the sentence’s quality assessment. From there, each argument is matched to a key point. The algorithm grades and ranks the prevalence of each key point by identifying how many sentences articulate the gist of the key point. In the fourth step, the narrative generation process selects the most prevalent key points and corresponding high-quality arguments to formulate a fluent narrative.

GRAMMY scale

With the volume and variety of data in addition to consumer traffic, we built several interfaces between the Speech by Crowd platform and the Debater APIs. We had to ensure that the platform could scale and handle hundreds to millions of users from around the world. Next, the real-time feedback around arguments had to achieve a 2-second response time. All the while, the system had to respect a hard request per second limit. These three requirements are challenging and potentially tradeoffs. All of the NLP artifacts such as key points and speeches had to run asynchronously. The run time depends on the number of arguments that grows during the event. Finally, the system needs to be scalable to handle undefined peak loads.

As shown in Figure 6, the hybrid cloud infrastructure that supports the GRAMMYs project has many different diverse parts. The real-time data processing part of the system handles dynamic loads. A total of 24 PODs managed by Red Hat OpenShift supports the workloads. The system only uses two GPUs in the real-time mode so that we can easily scale to available CPUs when the horizontal scalers sense CPU load. The real-time data flow does not have any caching, and we control load on the origin servers with global throttling.

The batch processing is very different. With fixed workloads, the system’s compute footprint can be managed without the need to handle spikes in traffic. 33 pods that are spread out over nine workers process the data. Note this difference. We use 12 GPUs for batch data so that the computation is faster without the need to scale out to less available GPUs than CPUs. Overall, we have approximately 12 AI and NLP models that ultimately create the speeches.

https://gist.github.com/luxzia/e42304d414173ed0f4f420ab49f88fb2 — Figure 6. Depiction of real-time and batch data processing cloud capacity

Now let’s look at the results of a topic.

GRAMMY Debates results

Figures 7–11 show initial results for the topic “Music education should be mandatory in all K-12 schools.” When we focus on the supporting arguments, you can see that many of these opinions are aligned toward brain development and self-esteem attainment. One of the arguments mentions that music training not only helps children develop fine motor skills, but aids emotional and behavioral maturation as well. You can also see another argument that states music stimulates brain development in children.

In the grid plot in Figure 8, each dot represents an argument. The spread of your arguments show that the crowd is highly polarized around this topic. In the purple, we have high-quality arguments that are against the topic. In gold, you can see the arguments that support mandatory music education. The crowd did not provide many neutral opinions. Notice how most of the arguments are of high quality. We filter out the lower-quality arguments that are used in the speech.

a chart with dots that shows most dots represent strong arguments with most very strongly for or against the topic. — Figure 8. Argument plot that shows stance versus quality

Now, let’s look at the key points we found that support and contest the notion that music education should be mandatory in all K-12 schools. The algorithms summarized the crowd’s arguments into 10 supporting key points and 6 contesting key points. Each of the key points is distinct and unique with supporting arguments. In general, the crowd thinks that music helps children develop, is important for education, increases brain capacity, encourages creativity, sharpens one’s ability to listen, and is a good indicator for academic success. One interesting key point that surfaced is that all learners should have access to music education. You will notice that one of the cons and concerns of the crowd is that music education costs too much. You can also see that some segments of the crowd thought music education will distract kids from core subjects, schools lack funding, schools have too many courses, and that it will degrade knowledge.

Finally, the key points and arguments were used to construct a fluent and cohesive speech. In Figure 11, you see a supporting speech that music education should be mandatory in all K-12 schools. The initial sentences of paragraphs restate a key point. We even provide the percentage of arguments that support the key point within the text. For example, you can see three highlighted themes. The first theme mentions that 21% of all arguments state that music in schools helps children develop better. As we go down further into paragraphs, the number of arguments that support a key point get smaller. Eleven percent of the arguments support the notion that music education is important to our schools. The supporting evidence is written as sentences from the crowd’s arguments. Next, 7% of arguments propose that music enhances brain coordination and increases brain capacity. The full speech had six total paragraphs that created a cohesive supporting narration. Below is the speech:

Music education should be mandatory in all K-12 schools
Supporting Speech
Greetings all. The following analysis is based on 320 arguments, contributed by people around the globe, identified as high-quality arguments supporting the notion that Music education should be mandatory in all K-12 schools.
21 percent of the arguments argued that music in schools helps children develop better. Music education has proven results in helping kids be more emotionally stable, as well as giving them a creative outlet. It also helps children in their learning process because it allows them to develop a talent and passion for music. Music stimulates brain development in children, integrating many different subjects, and is a way to connect with other people and relieve stress. It should be included in education since it is of great benefit for the cognitive and intellectual development of children.
Another recurring point, raised in 11 percent of the arguments, is that music education is important in our schools. Music education is an important aspect of providing children with a well-rounded education. It should be a priority in K-12 schools because music has more and more relevance in people’s lives. Music represents one of the universal arts most used by human beings, therefore, it is important to carry out the teaching of this art in schools of any social stratum. All schools should have music and arts education, as it is a fundamental part of human beings, and helps balance and bring joy to the mind, body, and soul.
7 percent of the arguments proposed that music enhances brain coordination and increases brain capacity. Music is an extremely important subject for all children to learn and can lead to better brain development, increases in human connection, and even stress relief. Music helps with logical thinking and is therefore useful. Using a different part of our brains gives us greater control and balance to our STEM focused curriculum. One of the most useful benefits of music education is the increased ability to process situations and find solutions. Music instruction also boosts engagement of brain networks that are responsible for decision making and the ability to focus attention as well as inhibit impulses.
…<truncated>…
To conclude, the above examples reflect the crowd’s opinions, that music education should be mandatory in all K-12 schools. Thank you for joining.

Music + debating

Music and debating go together just like artificial intelligence and humans. The combination of both creates world-class experiences where we can gain a better understanding around all sides of an issue. Learn more at https://www.ibm.com/sports/grammys/.

Join us live!

Join the authors of this post live on March 16, 2021, at 12 p.m. (ET) as we talk about the solution we built for the GRAMMYs and show you how the argument results changed over the course of the two weeks leading up to the show. Sign up at: https://www.crowdcast.io/e/grammys-debates-with-watson-from-the-Lab-to-musics-biggest-night/.

Originally published at https://developer.ibm.com.

GRAMMY Debates With Watson (Part 2)

From the lab to music’s greatest stage

Phase 2: Collecting Arguments from the Web

Phase 3: Speech Synthesis

Join us live!

Written by IBM Developer