SlopeGPT: The first payments risk model powered by GPT

Published in

Slope Stories

8 min readApr 11, 2023

(Patent pending)

Slope is a leading fintech AI company working on transitioning the B2B economy online through its B2B payments platform. At Slope, we’re committed to driving innovation that delivers tangible business results. The bar is especially high when that innovation is delivered in the form of AI. It needs to lead to measurable business impact and be the best tool for the job.

There has been a ton of recent excitement around large language models (LLMs) and generative AI. There’s a good reason for it: LLMs have shown the potential to revolutionize entire industries. But as always, with hype comes noise. When simply incorporating “AI” into your product definition can lead to a real, immediate boost in public interest, it is almost irrational not to do so. The issue is that in the long-run, a product built around a technology rather than a problem will always be suboptimal. We believe that the majority of today’s AI products are solutions in search of problems.

One of the most important lessons you learn in the field of AI is when not to use it. This is a lesson we know deeply as a team. And that is precisely why we were so excited to discover what we’re about to share today: an LLM applied to solve a core business use case for all fintechs: risk — in a way that no existing method can.

Why risk? Risk is the life and blood of any payments company

As Paul Graham told us in office hours, PayPal — and every payments company — is a risk company.

Risk is the essential problem of every fintech. In general, it is both the hardest problem to solve and leads to the greatest business impact if solved. Once risk is contained, every other problem becomes an order-of-magnitude easier.

For Slope, risk is especially important because lack of trust is one of the main reasons stopping B2B and cross-border payments from moving online. The issue is only exacerbated with newer, faster payment methods like FedNow, RTP, and same-day ACH; instant settlements make fraud and risk prevention even more difficult.

Bank Data: An invaluable source for fintechs

Bank data is one of the most commonly used sources by fintechs for managing payments and fraud risk. This data, which contains timestamps, descriptions, amounts, and more, is invaluable for risk assessment because it is real-time, rich, and difficult to falsify.

We can view the transaction data as a time-series.

The transaction descriptions have the potential to be particularly informative as they can be used to understand and decompose a business’s cash flow. For example, we can obtain a real-time view into a business’s income statement by categorizing transactions as revenue, expenses, debt payments, etc.

Problem: The diversity and idiosyncrasies of bank transaction data makes them difficult to categorize

Transaction descriptions and details are often unstructured and unstandardized, making it difficult to identify transaction types and decompose cash flow into its revenue, expense, loans, investments, and other components, which is important for assessing payments risk.

Traditionally, fintechs have understood transaction data by categorizing it through dictionaries, keywords, and other rule-based approaches.

However, there are two issues with rules-based approaches:

The diversity and idiosyncrasies of transaction descriptors make it difficult, if not impossible, for rules to characterize transactions reliably.
No two businesses are the same, and the same transaction description can mean two completely different things for different businesses. For example, ACH inflows can be revenue for some businesses, but for others, they are funding sources.

We have seen other fintechs compensate for the deficiencies of rule-based approaches by manually reviewing each customer’s bank transactions and generating custom rules. Clearly, this is costly and unscalable.

Enter: Large Language Models

LLMs like GPT capture semantic meaning in the form of embeddings, which are numerical representations of a sentence’s meaning. LLMs and the embeddings they generate are a natural solution for the first problem because they have demonstrated the ability to capture concepts and meaning, even when they are expressed in a myriad of different ways using natural language.[1]

Once the semantics of the bank transactions are embedded and captured by the LLM, we can group similar transactions and analyze their behavior to better determine the type of inflow or outflow each group comprises. For example, even though ACH inflows may represent funding sources for one customer and revenue for another, revenue tends to occur consistently with variable amounts, while funding tends to be sporadic in occurrence and somewhat predictable in amount.

The Hypothesis

The clusters generated by the embeddings of an LLM will result in more accurate transaction categories than rule-based methods because 1) the embeddings capture the meaning of inconsistent transaction descriptors, and 2) the clusters will be tailored to each individual business.

Unleashing GPT on our 2.5M transaction dataset

We decided to test that hypothesis on our dataset of 2.5M bank transactions collected over the 18 months we’ve been live. We tested LLMs, including GPT and Google’s BERT [2], as well as traditional similarity metrics such as Levenshtein (which focus on the similarity in characters but not in concept the way GPT does). We then benchmarked each method against our current production parser — which is primarily rules-based.

Ultimately, we found GPT to be the most performant. For the remainder of this article, we will be presenting some of the findings from GPT.

GPT’s unsupervised clusters give us a higher-resolution picture of a business’s state

The example below demonstrates how clustering GPT embeddings can decompose cash flow into distinct categories, despite varied descriptions. Here, we see how wire transfers are identified in blue.

Example of GPT clusters from transaction data, blue cluster is a cluster of Fedwire credit transfers.

And Amazon revenue inflows are identified in green.

Examples of clusters formed from GPT embeddings on transaction data. The green cluster is a cluster of revenue transactions coming from Amazon.

Next, by analyzing the behavior of each cluster over time, we can better identify the transaction type, even when their semantics is inconsistent across customers. In general, we see that stable and meaningful sources of inflows like revenue will occur regularly.

Single cluster occurrence interval distribution.

Their amounts, however, may vary significantly.

By modeling transaction occurrence and amount, we can estimate the probability of a cluster recurring and its expected amount at any given time.

In this way, we identify which clusters represent sporadic inflows like funding (low probability of recurrence with more predictable amounts) vs. stable inflows that are representative of a business’s health like revenue (high probability of recurrence with variable amounts).

Likewise, on the outflow side, we can identify variable costs, which recur and correlate with revenue, vs. fixed costs like payroll and interest expense.

SlopeGPT: an LLM-powered risk engine

Using the general framework described, we integrated GPT into our transaction processing engine to test if an LLM approach could really outperform today’s state of the art. Below is a visualization of what that looks like.

SlopeGPT ingests raw transaction data, feeds it into GPT, which then transforms the transactions into embeddings. The embeddings are clustered at the customer-level and analyzed to determine which cash flow component they comprise (e.g. sales, payroll, etc.). After that, they are used to generate additional features (e.g. seasonality, sales trends, etc.), which are joined with other risk features. These features are then passed into our payments risk model. The final output is a vector of decisions, which can include approve/reject, pricing, and so on.

SlopeGPT has already been deployed to production alongside our existing risk models. It has already helped us better understand businesses and detect risk signals that were missed by its counterparts. Additionally, we’ve built a real-time alerting system off the back of SlopeGPT that tracks live customer metrics including:

The stability of each cashflow source for a business (not all revenues are created equal).
The conditional recurring probability of each source, and its expected value.
Anomaly detection. For example, detecting turning points in a business’s operations, i.e. if a cashflow is expected to happen but doesn’t.
And more… stay tuned!

The future of Fintech AI

As a small team, we’re very conscious about investing our time into efforts that have high ROI. That includes knowing when not to use AI for the wrong reasons. The saying “if all you have is a hammer, every problem looks like a nail” applies well to AI, especially given all the attention it has received in recent years.

In our view, B2B payments is one of the domains truly well-suited to be disrupted by the AI models available today. The combination of a rapidly evolving e-commerce landscape, the adversarial and ever-changing nature of fraud, and the vastness of unstructured payments data create opportunities for large unsupervised models like GPT to outperform the status quo.

We also believe our team is uniquely positioned to capture the opportunity. Together, we’ve deployed risk models to solve fraud and credit in fintech, contributed to aeronautics and deep learning research across companies like IBM and Tesla, and built real-time AI systems to solve life-or-death problems in MRI brain imaging and autonomous driving (Lawrence’s first company was in self-driving). And now, we’re operating in a domain dominated by traditional frameworks in an evolving space that will increasingly reward adaptive, real-time systems and punish rigid, asynchronous ones.

Challenges breed innovation: we’re excited for the opportunity to take on and lead the fintech ecosystem with AI — always as a tool, never the end goal.

We are hiring!

We are hiring across risk, engineering, data science, and customer success. If you’re interested in joining Slope, email us at founders@slope.so.

References

[1] OpenAI. GPT-4 Technical Report. 2023.

[2] Neelakantan, Arvind, et al. Text and Code Embeddings by Contrastive Pre-Training. 2022. OpenAI.

[3] Ashish Vaswani, Shazeer, Parmar, et al. Attention Is All You Need. 2017.

[4] Devlin, Jacob, Chang, Lee, Toutanova, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019, Google.

This article was co-authored by Jason Huang, Bryant Chen, Alex Wu, Lawrence Lin Murata (LLM).
Jason Huang, CFA, was previously Staff Data Scientist at SoFi, M.S. at University of Notre Dame, B.S. Aeronautics at Beihang University.
Bryant Chen, Ph.D. UCLA, was Head of Credit Science at Brex and AI researcher at IBM.
Alex Wu was previously Sr. Deep Learning Engineer at Nauto, MLE at DeepScale (acq. Tesla), ML Researcher at UCLA Brain Mapping Center. B.S. UCLA CS.
Lawrence Lin Murata (LLM) previously led AI Platforms and Data Science at Nauto, founded a self-driving car startup Newton (acq. Nauto), and worked on Siri at Apple. B.S. Stanford CS AI (focus on NLP and CV).