NLP in Finance

O'Reilly Media
oreillymedia
Published in
5 min readSep 1, 2020

Editor’s Note: We think this piece is important because natural language processing (NLP) is a versatile solution for a range of problems that affect diverse industry verticals. As foremost experts in NLP, Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, and Harshit Surana review how NLP can address certain situations in finance, including loan risk assessments, auditing and accounting problems, and financial sentiment analysis. We’d love to hear from you about what you think about this piece.

In this piece, we’ll cover some specific applications of NLP in finance, including loan risk assessments, auditing and accounting problems, and financial sentiment analysis.

Financial Sentiment

Stock market trading relies on a set of information about specific companies. This knowledge helps create a set of actions that determine whether to buy, hold, or sell off stock. This analysis can be based on companies’ quarterly financial reports or on what analysts are commenting about the companies in their reports. This can also come from social media.

Social media analysis helps in monitoring social media posts and pointing out potential opportunities for trading. For instance, if a CEO is resigning, that sentiment is often negative, which can negatively affect the company’s stock price. On the other hand, if the CEO is not performing well and markets welcome their resignation, that could lead to an increase in stock price. Examples of companies that provide this information for trading include DataMinr and Bloomberg.

Financial sentiment analysis is different from regular sentiment analysis. It’s not just different in domain, but also in purpose. Generally, the purpose is to guess how the markets will react to a piece of news, as opposed to whether the news is inherently positive or not. There have been efforts to adapt BERT to the financial domain. One of these is FinBERT.

FinBERT uses a subset of financial news from Reuters. For sentiment classification, it uses Financial PhraseBank, which has over 4,000 sentences labeled by people with backgrounds in business and finance. Unlike regular sentiment analysis, where positive means that something is of positive emotion, in Financial PhraseBank, a positive sentiment indicates that the stock price of the company will increase based on the news in the sentence. FinBERT led to an accuracy of 0.97 and an F1 of 0.95 — a substantial improvement over other general state-of-the-art methods. FinBERT is a library that’s available on GitHub, along with its data. We can build on this library for custom problems and use the pre-trained models for financial sentiment classification.

Risk Assessments

Credit risk is a way to quantify the chances of a successful loan repayment. It’s generally calculated by an individual’s past spending and loan repayment history. However, this information is limited in many scenarios, especially in underprivileged communities. It’s estimated that more than half of the world’s population is excluded from financial services. NLP can help alleviate this problem. NLP techniques can add a lot more data points that can be used to assess credit risk. For example, in business loans, entrepreneurial ability and attitude can be measured using NLP. This approach is used by Capital Float and Microbnk. Similarly, incoherencies in data provided by the borrower can also be surfaced for more scrutiny. Other more nuanced aspects, such as lenders’ and borrowers’ emotions while applying for a loan, can also be incorporated.

Often in personal loan agreements, various information has to be captured from loan documents, which are then fed to credit risk models. The information captured helps in identifying credit risk, and erroneous data extraction from these documents can lead to flawed assessments. Named entity recognition (NER) can improve this.

Accounting and Auditing

The global firms Deloitte, Ernst & Young, and PwC now have a significant focus on delivering more meaningful, actionable, and relevant audit conclusions and observations on a company’s annual performance. While applying aspects of NLP and ML to areas like contract document reviews and long-term procurement agreements, Deloitte, for example, has evolved its Audit Command Language into a more efficient NLP application. This is covered in more detail in their report on government data.

In addition, after decades of long, drawn-out ticking and tying of reams of endless, typical day-to-day transactions and other pieces of paper like invoices, companies have finally realized that NLP and ML has a significant advantage in the audit process. This advantage manifests in the direct identification, focus, visualization, and trend analysis of outliers in transaction types. Time and effort are spent on the investigation of these outliers and their causes. This results in early identification of potentially significant risks and possible fraudulent activity like money laundering along with potentially value-generating activities that can be emulated and extrapolated across a company and customized for various business processes.

Learn faster. Dig deeper. See farther.

Join the O’Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Sowmya Vajjala has a PhD in Computational Linguistics from University of Tubingen, Germany. She currently works as a research officer at National Research Council, Canada’s largest federal research and development organization. Her past work experience spans both academia as a faculty at Iowa State University, as well as industry at Microsoft Research and The Globe and Mail. Bodhisattwa Majumder is a doctoral candidate in NLP and ML at UC San Diego. Earlier he studied at IIT Kharagpur where he graduated summa cum laude. Previously, he built large-scale NLP systems at Google AI Research and Microsoft Research, which went into products serving millions of users. Currently, he is leading his university team in the Amazon Alexa Prize for 2019–2020. Anuj Gupta has built NLP and ML systems at Fortune 100 companies as well as startups as a senior leader. He has incubated and led multiple ML teams in his career. He studied computer science at IIT Delhi and IIIT Hyderabad. He is currently Head of Machine Learning and Data Science at Vahan Inc. Above all, he is a father and husband. Harshit Surana is founder at DeepFlux Inc. He has built and scaled ML systems at several Silicon Valley startups as a founder and an advisor. He studied computer science at Carnegie Mellon University where he worked with the MIT Media Lab on common sense AI. His research in NLP has received over 200 citations.

--

--

O'Reilly Media
oreillymedia

O'Reilly Media spreads the knowledge of innovators through its books, video training, webcasts, events, and research.