Coronavirus: NLP to the Rescue

SumUp Analytics
4 min readMar 1, 2020


[Newsletter Week 1, 2/28/20]

SumUp Analytics

As the Coronavirus (Covid-19) has spread from a local threat to an official global health emergency, the media attention has grown alongside it. Sifting through the rapidly emerging content to find a through-line is difficult, even for advanced researchers.

SumUp’s analytics platform Nucleus discerns the most emphasized topics from multiple, large bodies of text by using advanced natural language processing (NLP). This newsletter provides a synthesized view of the topics surrounding Coronavirus and allows you (the reader) to draw your own conclusions and dive deeper into the information most important to you.

The first section reveals the most emphasized topics across ‘reference’ sources such as scientific health journals, organizations, and governing bodies. Each topic includes a summary and recommended source, as determined by Nucleus, for further reading.

The second section offers the list of topics from ‘blog’ sources and represents reports from mainstream media sources that you can compare with reports from the health organizations/scientific journals.

The period covered by those datasets ranges from 01/15/2020 to 02/27/2020.

If you wish to further explore NLP’s applications, please contact us at

*SumUp Analytics specializes in NLP and has no expertise in health or healthcare. The quality of the output relies heavily on the accuracy and expertise of the input sources listed at the end of this newsletter.

Key topics extracted from official reference sources:

Topic 1: Princess Diamond Cruise Ship

Following the progression of the virus remains a primary topic. The Prince Diamond Cruise ship was one of the first hubs for the virus and it is often mentioned in relation to cases in different countries whose origin can be traced back to this cruise ship.

Suggested reading:

Topic 2: Confirmed Coronavirus Infections

Tracking confirmed cases around the world, particularly outside of China, is the second most important topic in this corpus. There is a notable emphasis listing every country where new cases have been reported.

Suggested reading:

Topic 3: South Korea, Singapore, Malaysia, Thailand, Philippines

This topic is also about following the progression of the virus, with an emphasis on countries adjacent to the initial epicenter of the virus.

Suggested reading:

Topic 4: Negative Tests

Topic 4 outlines a number of areas where concerns about the virus have led to tests that turned out negative.

Suggested reading:

Topic 5: Disease Control and Prevention

Topic 5 outlines a number of companies that are joining the list of companies working on control and prevention for the new coronavirus. It also outlines some of the early research results such as the origin of the virus in bats (according to the CDC).

Suggested reading:

Topic 6: Wuhan, Hubei Province

This topic relates to general matters around Wuhan, the place that the epidemics originated from, including evacuations by the Japanese authorities and the higher mortality rate in that region (at 2.7%) compared to other provinces (as low at 0.4%).

Suggested reading:

Topic 7: Virus Spread Control

This topic mentions measures that have been implemented by various countries to control the spread of the virus, such as monitoring passenger arrival and the cancellation of public events.

Suggested reading:

Topic 8: Acute Respiratory Syndrome

This topic describes the most characteristic Coronavirus symptom: acute respiratory syndrome, which is shared with a number of other viruses.

Suggested reading:

SumUp’s Qualitative Assessment:

For an outside observer, and ex-post, the above topics seem characteristic of the description of an early stage epidemic of a newly discovered virus: descriptive comments on the mechanical spread of the virus, early-stage description of the symptoms and detailing early preventive measures. However, there is a lack of emphasis on more substantive elements such as a description of the causes or the origins of the epidemic and, more vitally significant, there is no description of potential cures, not even of early-stage trials of new drugs.

Going forward, this newsletter will review the same sources of information on a regular basis, trying to identify changes in the nature of the subjects mentioned, such as a potential shift from a focus on epidemic dynamics to potential cures.

Key topics extracted from healthcare blogs:

Topic 1: Mike Pence leads Coronavirus response

Topic 2: Disease control and prevention

Topic 3: Trump administration (in relation to response to coronavirus)

Topic 4: South Carolina debate (in relation to mentions of the Coronavirus during the debate)

Topic 5: Health emergency crisis

Topic 6: Princess diamond cruise ship

Topic 7: Trump taps Pence

Topic 8: Pelosi and Schumer attack Trump’s choice

Qualitative assessment:

The general media seems to discuss politics in the context of epidemics rather than the epidemic itself. Very little is actually said about the Coronavirus, even though all of the topics mentioned are mentioned in relation to the virus. Unfortunately, relatively little is available in the general press about the actual virus.


References Sources

Global Health Now reference articles:

Pharmaceutical technology coronavirus microsite:

JAMA coronavirus site:

WHO coronavirus reference page:

CDC coronavirus summary page:

CDC coronavirus FAQ page:

LiveScience coronavirus FAQ page:

John Hopkins coronavirus summary page:

Global Health Now coronavirus summary page:

Blog sources



SumUp Analytics

Accelerate Understanding: Explainable AI for Text at Scale