Redefining “libwebp” Vulnerability Scoping with LLMs and Knowledge Graphs

Daniel Alfasi
3 min readOct 2, 2023

--

The discovery and analysis of vulnerabilities often feel like a race against time. Understanding the full scope of a CVE (Common Vulnerabilities and Exposures) is imperative. It equips security and development teams with the knowledge they need to determine if their product is at risk.

The recently disclosed CVE-2023–4863 has its origins in the open-source libwebp. This library is used by many web browsers and image editors to display WebP format images.

Initially, this heap buffer overflow vulnerability was attributed exclusively to Google Chrome. However, its scope expanded as other browsers began reporting similar concerns.

CVE-2023–4863 in the National Vulnerability Database (NVD) website

The NVD team reported the following CPEs (Common Platform Enumeration) associated with CVE-2023–4863:

cpe:2.3:a:google:chrome::::::::

cpe:2.3:o:fedoraproject:fedora:37/38/39:::::::*

cpe:2.3:o:debian:debian_linux:10.0/11.0/12.0:::::::*

cpe:2.3:a:mozilla:firefox::::::::

cpe:2.3:a:mozilla:firefox_esr::::::::

cpe:2.3:a:mozilla:thunderbird::::::::

cpe:2.3:a:microsoft:edge::::::::

Is There a Room for Improvement???

At @DEEPNESS lab, our research focuses on practical analysis of CVE scope using knowledge graph representation learning and LLMs (large language models).

We use a knowledge graph to map the echosystem of CVEs and related security entities. This approach helps us encapsulate contextual information about vulnerabilities.

Direct relations between CVE-2023–4863 and its CPEs as specified by NVD

Our primary focus is on the inductive link prediction task, aiming to forecast relationships between previously unseen entities, such as specific vulnerabilities and products.

The inference graph contains CVEs unseen during the training process.

We employ multi-modal deep learning algorithms, utilizing both
NodePiece for knowledge graph representation and OpenAI’s Ada LLM, a large language model by OpenAI for the textual embedding of the description field of the CVE.

Deep Dive: Analyzing libwebp CVE-2023–4863

Our model has been trained on NVD CVE data up to July 2022.

To identify products (CPEs) associated with CVE-2023–4863, we evaluate all known CPEs and assess their relevance to this particular CVE.

Zooming in, our focus is on predicting the head entity in the relation: CPE (?) → CVE (CVE-2023–4863).

Among our top predictions, we found several that match the ground truth CPEs previously listed. Notably, we also uncovered additional CPEs not specified by the NVD.

Snippet from Top20 predictions, based on data learned up to July 2021.

Emerging Vulnerabilities: Products Not Listed in CVE-2023–4863

Based on our analysis, the following products were not specified in CVE-2023–4863 but are affected by it and verified by other sources:

As indicated by our model’s output, which aligns with Rezillion’s analysis, even operating systems like Linux Ubuntu, RedHat Enterprise Linux, and internal software libraries reliant on libwebp may be affected.

Summary

Understanding the full scope of vulnerabilities is a race against time. We delve deep into the potential of leveraging Large Language Models (LLMs) and knowledge graphs to redefine vulnerability analysis.

Using the case study of the recently discovered libwebp CVE-2023–4863, we predict associated vulnerable products, including those not initially specified. The findings are further corroborated by Rezillion’s and Snyk’s analysis, emphasizing the widespread implications even on operating systems and software libraries dependent on libwebp.

For more information on our research see DEEPNESS Lab.

Authors
Daniel Alfasi, Reichman University
Tal Shapira, PhD, The Hebrew University of Jerusalem
Anat Bremler-Barr, Prof, Tel-Aviv University

Our research is partially supported by RedHat and Google Cloud Research Credits program.

--

--

Daniel Alfasi
Daniel Alfasi

Written by Daniel Alfasi

Data Scientist @ CyberArk. Researcher @ DEEPNESS Lab. Working on applying Graph models (GNNs, KG Embeddings) and LLMs applied to cybersecurity problems

Responses (1)