Redefining “libwebp” Vulnerability Scoping with LLMs and Knowledge Graphs
The discovery and analysis of vulnerabilities often feel like a race against time. Understanding the full scope of a CVE (Common Vulnerabilities and Exposures) is imperative. It equips security and development teams with the knowledge they need to determine if their product is at risk.
The recently disclosed CVE-2023–4863 has its origins in the open-source
libwebp. This library is used by many web browsers and image editors to display WebP format images.
Initially, this heap buffer overflow vulnerability was attributed exclusively to Google Chrome. However, its scope expanded as other browsers began reporting similar concerns.
The NVD team reported the following CPEs (Common Platform Enumeration) associated with CVE-2023–4863:
Is There a Room for Improvement???
At @DEEPNESS lab, our research focuses on practical analysis of CVE scope using knowledge graph representation learning and LLMs (large language models).
We use a knowledge graph to map the echosystem of CVEs and related security entities. This approach helps us encapsulate contextual information about vulnerabilities.
Our primary focus is on the inductive link prediction task, aiming to forecast relationships between previously unseen entities, such as specific vulnerabilities and products.
We employ multi-modal deep learning algorithms, utilizing both
NodePiece for knowledge graph representation and OpenAI’s Ada LLM, a large language model by OpenAI for the textual embedding of the description field of the CVE.
Deep Dive: Analyzing libwebp CVE-2023–4863
Our model has been trained on NVD CVE data up to July 2022.
To identify products (CPEs) associated with CVE-2023–4863, we evaluate all known CPEs and assess their relevance to this particular CVE.
Zooming in, our focus is on predicting the head entity in the relation: CPE (?) → CVE (CVE-2023–4863).
Among our top predictions, we found several that match the ground truth CPEs previously listed. Notably, we also uncovered additional CPEs not specified by the NVD.
Emerging Vulnerabilities: Products Not Listed in CVE-2023–4863
Based on our analysis, the following products were not specified in CVE-2023–4863 but are affected by it and verified by other sources:
- Edge Chromium
CPE wasn’t specified in the NVD CPEs (Microsoft Edge does). This information comes from a Microsoft security notice: https://learn.microsoft.com/en-us/deployedge/microsoft-edge-relnotes-security#september-15-2023
Required by Archlinux, “required by” section: https://archlinux.org/packages/extra/x86_64/libwebp/
Two Qt security advisories: GDI Font Engine & WebP image format: https://www.qt.io/blog/two-qt-security-advisorys-gdi-font-engine-webp-image-format
- Python’s Pillow
As indicated by our model’s output, which aligns with Rezillion’s analysis, even operating systems like Linux Ubuntu, RedHat Enterprise Linux, and internal software libraries reliant on
libwebp may be affected.
Understanding the full scope of vulnerabilities is a race against time. We delve deep into the potential of leveraging Large Language Models (LLMs) and knowledge graphs to redefine vulnerability analysis.
Using the case study of the recently discovered
libwebp CVE-2023–4863, we predict associated vulnerable products, including those not initially specified. The findings are further corroborated by Rezillion’s and Snyk’s analysis, emphasizing the widespread implications even on operating systems and software libraries dependent on
For more information on our research see DEEPNESS Lab.
Daniel Alfasi, Reichman University
Tal Shapira, PhD, The Hebrew University of Jerusalem
Anat Bremler-Barr, Prof, Tel-Aviv University
Our research is partially supported by RedHat and Google Cloud Research Credits program.