Hunting bugs in Accel-PPP with CodeQL

Published in

CSG @ GovTech

13 min readApr 21, 2022

Introduction

A few months ago, my mentor Eugene Lim challenged my teammate Kar Wei and I to find bugs in Accel-PPP, an open-source VPN server. While conducting a source code review on an unfamiliar software was a painstaking process, we successfully discovered three exploitable memory corruption vulnerabilities: CVE-2022–24704, CVE-2022–24705, and CVE-2022–0982.

This article will discuss how I used CodeQL to find vulnerabilities in Accel-PPP and outline my approach to getting started with CodeQL and source code review. I will also be discussing the limitations I encountered with CodeQL.

Picking a target

We started by targeting memory corruption vulnerabilities in C and C++ code from GitHub repositories. To facilitate this process, we used grep.app, a GitHub repository search engine, to search for code patterns across half a million public repositories on GitHub. Using grep.app, we filtered C and C++ repositories with regular expressions. We searched for memcpy function calls whose third argument was a variable named “len”, based on the assumption that it was more likely to be a user-controllable variable. This returned about 500 results.

After analysing these results, Eugene found a bug in Accel-PPP and challenged us to find the same bug and present our findings to him by the end of the week. This was a daunting prospect due to the tight deadline and my unfamiliarity with vulnerability research. I was tempted to give up but pushed myself to step out of my comfort zone. As manually reviewing thousands of lines of code was tedious and time-consuming, I decided to use CodeQL.

CodeQL

CodeQL is a semantic code analysis engine that uses SQL-like queries to analyse source code. One notable use case was when Microsoft used CodeQL queries to detect the presence of code-level indicators of compromise from the SolarWinds supply chain attack.

CodeQL’s data flow analysis feature proved to be especially useful in finding vulnerabilities in Accel-PPP. It could be used to track the flow of untrusted inputs in the codebase, helping me to understand if data was parsed insecurely or used in dangerous functions. CodeQL’s data flow libraries implement data flow analysis on a program or function by modelling its data flow graph, where nodes represent semantic elements that carry values, and edges represent data flows between elements. CodeQL data flow queries specify sources and sinks that act as the start and end nodes in any path through this graph.

For clarity, in a bug-hunting scenario, a “source” is a user input, while a “sink” is where the user input is used.

Thought process

Before writing the queries, I considered the following:

- What are the source functions that I should focus on?

- What functions does a VPN server use to take in user input?

- Will untrusted input flow to the 3rd argument of the memcpy function?

Some possible sources from which the VPN server could receive client information included IP address, username, password, cmdline commands, and packet information. With that in mind, I crafted a flowchart of the CodeQL queries to facilitate my thought process:

Now that we were roughly familiar with the flow of the CodeQL queries, we could dive into querying Accel-PPP.

First query

This query aimed to find code flows from the four possible user-controllable source functions (recv, recvfrom, recvmsg, and read) to memcpy functions.

Explanation of query:

The data flow graph was computed using classes to model the program elements representing the graph’s nodes.

The data flow between the nodes was modelled using predicates to compute the graph’s edges.

- @kind path-problem property: specified path queries that provided information about the identified paths from sources to sinks. Paths could be examined in the Path Explorer View in Visual Studio Code

- dataflow.TaintTracking library provided a framework for implementing solvers for global taint tracking problems

- Dataflow::PathGraph was the path graph module whereby source and sink are nodes on the path graph

- override predicate isSource to specify the source which is a function call to “recvfrom”

- override predicate isSink to specify the sink which is a function call to “memcpy”, argument 3 (*CodeQL argument count starts at 0)

- config.hasFlowPath(source, sink) to connect the source to the sink and find the code flow

/**
 * @kind path-problem
 */
 
 import cpp
 import semmle.code.cpp.dataflow.TaintTracking
 import DataFlow::PathGraph
 
 class Config extends TaintTracking::Configuration {
 Config() { this = “RecvUserInputToSink” }
 override predicate isSource(DataFlow::Node source) {
 source.asExpr().(FunctionCall).getTarget().hasGlobalName(“recvfrom”)
 }
 
 override predicate isSink(DataFlow::Node sink) {
 exists(FunctionCall fc | fc.getTarget().hasName(“memcpy”) and sink.asExpr() = fc.getArgument(2))
 }
 }
 
 from Config cfg, DataFlow::PathNode source, DataFlow::PathNode sink //PathNode shows path of sources and sinks
 where cfg.hasFlowPath(source, sink)
 select sink, source, sink, “Taint from recvfrom to memcpy “ + source

Sadly, I did not receive any results after running this query for each source function… :(

As I was not 100% sure that the untrusted input originated from any of the four source functions mentioned, I changed my approach to a more generalised query.

Second Query

This query was like the first but instead allowed all source functions. In this case, more results were likely to be returned for me to conduct a deeper analysis.

- override predicate isSource to specify the source which are all function calls in the program

- override predicate isSink to specify the sink which is a function call to “memcpy”, argument 3 (*CodeQL argument count starts at 0)

/**
 * @kind path-problem
 */
 import cpp
 import semmle.code.cpp.dataflow.TaintTracking
 import DataFlow::PathGraph
 
 class Config extends TaintTracking::Configuration {
 Config() { this = “RecvUserInputToSink” }
 override predicate isSource(DataFlow::Node source) {
 source.asExpr().(FunctionCall).getParent().fromSource()
 }
 
 override predicate isSink(DataFlow::Node sink) {
 exists(FunctionCall fc | fc.getTarget().hasName(“memcpy”) and sink.asExpr() = fc.getArgument(2))
 }
 }
 
 from Config cfg, DataFlow::PathNode source, DataFlow::PathNode sink //PathNode shows path of sources and sinks
 where cfg.hasFlowPath(source, sink)
 select sink, source, sink, “Taint from UDF to memcpy “ + source

Running this query returned 136 results and after geeking out for close to a day, I found one seemingly vulnerable result.

Memory Corruption vulnerability in the packet.c file (CVE-2022–24704)

Diving into the opt/src/accel-pppd/radius/packet.c file, there was a mempool_alloc function that was used to allocate memory and assign the ptr pointer to point to the first element of the allocated memory. Next, it initialised a struct element pack->buf to ptr, and a while-loop with a recvfrom function was used to receive packet information from the client and store it in pack->buf.

Following that, the ptr variable is dereferenced and stored in the len variable.

A switch statement with case ATTR_TYPE_IPV6ADDR was used where the len variable was passed in as the 3rd argument in a memcpy function. Additionally, there were no checks before the memcpy function that ensured the argument size did not overflow the destination variable.

On closer inspection, I found that &attr->val.integer could hold a maximum of 16 bytes whereas len could hold a maximum of 255 bytes. Due to the lack of validation, len could be larger than &attr->val.integer, causing a memcpy of more than 16 bytes from ptr to &attr->val.integer. This resulted in an out of bounds write into the attr struct buffer.

Great! Seems like we have found our first bug :)

Summary of Code flow:

attr->val.integer = max 16 bytes

len = max 255 bytes

ptr = mempool_alloc(buf_pool)
pack->buf = ptr
n = recvfrom (fd,pack->buf…)……
len = *ptr — 2
memcpy(&attr->val.integer, ptr, len)

At this point, you may be wondering why the first query of recvfrom -> memcpy did not return this code flow. This was because the second argument in the recvfrom function did not directly correlate to the third argument in the memcpy function as it had been dereferenced before the memcpy function. Since CodeQL global taint tracking libraries do not look for possible aliasing relations that have been established before the source (i.e., before the call to recvfrom), this code flow was missed.

Exploitation

To reproduce this vulnerability, I executed the following evilRadius script to send a large packet to the server. This resulted in a malloc():corrupted top size segfault, demonstrating a buffer overflow vulnerability.

Woohoo, my first bug called for a celebration! But before that, let’s try to find variants of it through variant analysis.

Memory Corruption vulnerability in the packet.c file (CVE-2022–24705)

Variant analysis

Since I was familiar with the code flow of the first bug, finding variants of it using CodeQL was straightforward. This query was like the first but instead used mempool_alloc as the source function.

/**
 * @kind path-problem
 */
 
 import cpp
 import semmle.code.cpp.dataflow.TaintTracking
 import DataFlow::PathGraph
 
 
 class Config extends TaintTracking::Configuration {
 Config() { this = “RecvUserInputToSink” }
 
 override predicate isSource(DataFlow::Node source) {
 source.asExpr().(FunctionCall).getTarget().hasGlobalName(“mempool_alloc”)
 }
 
 override predicate isSink(DataFlow::Node sink) {
 exists(FunctionCall fc | fc.getTarget().hasName(“memcpy”) and sink.asExpr() = fc.getArgument(2))
 }
 }
 
 from Config cfg, DataFlow::PathNode source, DataFlow::PathNode sink //PathNode shows path of sources and sinks
 where cfg.hasFlowPath(source, sink)
 select sink, source, sink, “Taint from mempool_alloc to memcpy “ + source

The query returned me 3 results with one of them proving vulnerable. Like case ATTR_TYPE_IPV6ADDR, case ATTR_TYPE_IPV6PREFIX also had the same vulnerability. attr->val.ipv6prefix.prefix could hold a maximum of 16 bytes whereas len can hold a maximum of 255 bytes. Due to the lack of validation, len could be larger than &attr->val.ipv6prefix.prefix causing a memcpy of more than 16 bytes from ptr to &attr->val.ipv6prefix.prefix. This resulted in an overly large memory write into the attr struct buffer.

And there we go, our second bug!

Summary of code flow:

attr->val.ipv6prefix.prefix = max 16 bytes

len = max 255 bytes

ptr = mempool_alloc(buf_pool)
pack->buf = ptr
……n = recvfrom (fd,pack->buf…)
len = *ptr — 2
memcpy(&attr->val.ipv6prefix.prefix, ptr + 2, len — 2)

Even though I had found the second bug, I was not satisfied with the query. I wanted to use recvfrom as the source instead of mempool_alloc but CodeQL could not find the code flow between them…

After a bit of googling, I came across the isAdditionalTaintStep method. In CodeQL, the isAdditionalTaintStep is the taint analysis version for isAdditionalFlowStep. When performing flow and taint analysis, CodeQL only analyses paths that go through user code. Calls into third-party methods are considered black boxes unless you add some additional modelling steps. That’s where the isAdditional{Flow|Taint}Step comes in. You can model paths through library code by implementing these predicates and describing how data goes in and out.

In short, the isAdditionalTaintstep method is used to force two disconnected nodes to connect.

IsAdditionalTaintStep:

Recap on first bug:

Here was my thought process to find this variant with the isAdditionalTaintStep method:

- Catch the 2nd argument in the recvfrom function and find its assignment

- Use isAdditionalTaintStep method to find instances of the ptr variable that had been dereferenced

- Redirect the flow back to memcpy argument 3

However, I encountered a roadblock. There was a “write” to pack->buf followed by a “read” of *ptr, and because CodeQL does not look for aliasing relations that had been established before the source (i.e., before the call to recvfrom), the code flow will be missed.

As mentioned before, the second argument in the recvfrom function does not flow straight into memcpy as it has been dereferenced before the memcpy function and CodeQL’s global taint tracking library did not support this.

One way to solve this issue was to use the isAdditionalTaintStep method to transfer taint between pointers which we know would have the same value at runtime. This could be done by using the semmle.code.cpp.valuenumbering.GlobalValueNumbering library. The predicate globalValueNumber, which is an abstract type representing the value of the expression, converted an expression into a GVN. If two expressions have the same GVN, they would compute the same value.

Using this query, I should be able to find the code flow from the 2nd argument in the recvfrom function, transfer the taint to “len = * ptr -2” and pass the flow back to memcpy.

/**
 * @kind path-problem
 */
 
 import semmle.code.cpp.dataflow.TaintTracking
 import semmle.code.cpp.valuenumbering.GlobalValueNumbering
 import DataFlow::PathGraph
 
 class Conf extends TaintTracking::Configuration {
 Conf() { this = “Conf” }
 
 override predicate isSource(DataFlow::Node source) {
 // The source is a value that’s coming out of `recvfrom`’s second argument.
 source.asDefiningArgument() =
 any(Call | call.getTarget().hasName(“recvfrom”)).getArgument(1)
 }
 
 override predicate isSink(DataFlow::Node sink) {
 // The sink is a value that goes into `memcpy`’s third argument.
 sink.asExpr() = any(Call call | call.getTarget().hasName(“memcpy”)).getArgument(2)
 }
 
 override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
 // Transfer flow from any value that’s coming out of a function call, and into expressions
 // which we know will have the same value at that point in the execution.
 globalValueNumber(node1.asDefiningArgument()).getAnExpr() = node2.asExpr()
 }
 }
 
 from Conf conf, DataFlow::PathNode source, DataFlow::PathNode sink
 where conf.hasFlowPath(source, sink)
 select sink.getNode(), source, sink, “recvfrom to memcpy with an additional taint step”

However, after running this query on the Accel-PPP codebase, no results were returned…

After much head scratching, I realised that the ptr variable in “len = * ptr -2” and the pack->buf variable in the recvfrom function had different values at runtime. For instance, there was a line of “ptr += 16” before “len = *ptr -2”. Hence, GlobalValueNumbering did not work in this case.

Running out of options, I sought help from the CodeQL GitHub community. Unfortunately, one of the contributors could not solve it either. :(

Even though this was an edge case, I managed to learn a lot. Hopefully, someone will be able to provide a solution to this problem soon!

Moving on, I wanted to continue exploring the codebase for more vulnerabilities and proceeded to the last step of the flowchart. This time, I queried all source expressions to memcpy functions.

Memory Corruption vulnerability in the telnet.c file (CVE-2022–0982)

After running the following query, close to 1300 results were returned. Imagine having to analyse every single one of them… S̵PAIN…

/**
 * @kind path-problem
 */
 
 import cpp
 import semmle.code.cpp.dataflow.TaintTracking
 import DataFlow::PathGraph
 
 class Config extends TaintTracking::Configuration {
 Config() { this = “RecvUserInputToSink” }
 override predicate isSource(DataFlow::Node source) {
 exists(Expr E |
 source.asExpr() = E
 )
 }
 
 override predicate isSink(DataFlow::Node sink) {
 exists(FunctionCall fc | fc.getTarget().hasName(“memcpy”) and sink.asExpr() = fc.getArgument(2))
 }
 }
 
 from Config cfg, DataFlow::PathNode source, DataFlow::PathNode sink
 where cfg.hasFlowPath(source, sink)
 select sink, source, sink, “Taint from source expressions to memcpy “ + source

Fortunately, I realised that many of them were either duplicates or bounds-checking statements before memcpy function calls to ensure a buffer overflow does not occur. This allowed me to quickly skim through the results and filter out possible bugs for deeper analysis.

For more details on filtering out false positives from CodeQL results, check out my fellow intern Kar Wei’s blogpost on this topic.

And that was how I found the third bug!

The third bug was in the /opt/src/accel-pppd/cli/telnet.c file whereby user input cmdline_len was copied into a buffer b->buf without any bound checks.

In the cln_read function, the telnet_input_char function was called, passing in cln and recv_buf[i] as arguments. In this case, cln was a struct that contained a “cmdline_len” element that was user controllable.

In the telnet_input_char function, a memcpy function was called to copy cmdline_len number of bytes which was user-controllable into a buffer, buf. Due to the lack of validation, cmdline_len could be larger than the size of buf, resulting in an out of bounds write. Perfect!

Exploitation

To reproduce this vulnerability, I ran the following evilTelnet script to send a large packet to the server. This resulted in a double free or corruption (out) error, demonstrating that the program’s memory management data structures had been corrupted. Unfortunately, the corruption occurred during the exit process, and I could not write to any sensitive bytes on the heap.

Conclusion

Even though I encountered many rabbit holes during my bug hunting journey, I found it extremely fulfilling and enriching. It allowed me to hone my source code review skills and discover an additional 3rd bug! I am thankful to have a great mentor, Eugene Lim, for guiding Kar Wei and myself through this process. Without his guidance, we would not have been able to find these bugs.

I believe that CodeQL is a good tool for vulnerability research. Still, it could have been better if there were more documentations on how to use many of the “hidden” (e.g., GlobalValueNumbering) libraries. I benefited greatly from blogs written by other researchers and the active CodeQL GitHub community.

If you are interested in CodeQL, check out our variant analysis and visualisation tool, C0deVari4nt. We will be presenting this tool in the upcoming Black Hat Asia Arsenal 2022 so be sure to say hi if you happen to be there!