Extending the Exploration and Analysis of Windows RPC Methods Calling other Functions with Ghidra 🐉, Jupyter Notebooks 📓 and Graphframes 🔗!

Published in

Open Threat Research

17 min readJul 21, 2020

A few weeks ago, I was going over some of the research topics in my to-do list, and the one that sounded interesting to work on during 4th of July weekend 😆 was the documentation and exploration of relationships between RPC procedures/methods and other external functions such as Win32 APIs in a Windows endpoint. Either you are doing offensive or defensive research, this type of research not only can help you to uncover alternative ways to execute code, but also provide additional context to some of the known techniques out there that leverage specific functions to perform an action.

In this post I will show you how I extended previous amazing work by Adam (@_xpn_) and @Sektor7Net (Reenz0h) to enumerate and export relationships between RPC methods and other functions with the help of Ghidra and how I used Jupyter Notebooks and Graphframes to consume the results in a graph format and search for structural patterns in it.

What is RPC?

RPC is an inter-process communication (IPC) mechanism that enables data exchange and the invocation of functionality that resides in a different process. The different process can be on the same machine, on the local area network (LAN), or across the Internet

A Client/Server Execution Model

According to OSF’s Distributed Computing Environment (DCE) 1.1, the RPC model makes a functional distinction between clients and servers. A client requests a service, and a server provides the service by making resources available to the remote client.

The Microsoft RPC (MRPC) Model

It is an extension of the OSF DCE RPC standard and this is a basic architecture overview I put together after reading some of the concepts behind it:

Adapted from https://docs.microsoft.com/en-us/windows/win32/rpc/how-rpc-works and https://youtu.be/2GJf8Hrxm4k

Reading about the following concepts helped me to get familiarized with the architecture shown above and understand the output of tools such as NTObjectManager by James Forshaw:

Interfaces: An interface is a set of remotely callable operations offered by a server and invoked/requested by clients.
Interface UUID: An interface universally unique identifier (UUID) that identifies the interface to which the called operation belongs.
Opnum: An operation number or numeric identifier that is used to identify an specific operation within the interface.
Procedures: Callable operations also known as methods available via an interface, offered by a server and invoked/requested by clients.
RPC protocol sequence: A character string that represents a valid combination of a RPC protocol, a network layer protocol, and a transport layer protocol (i.e ncacn_ip_tcp)
Endpoints: Depending on the RPC protocol sequence being used, an endpoint could be a port number, a named pipe or simply a name. When client and server are not in the same machine, the server listens on a port or group of ports. These port number are called endpoints in RPC.

https://conference.hitb.org/hitbsecconf2017ams/materials/D2T3%20-%20James%20Forshaw%20-%20Introduction%20to%20Logical%20Privilege%20Escalation%20on%20Windows.pdf

Endpoint Mapper: An RPC service that manages endpoints.
RPC run-time system: A library of routines and a set of services that handle the network communications that underlie the RPC mechanism.
Binding: Establishment of a relationship between a client and a server that permits the client to make a remote procedure call to the server.
Marshal: To encode one or more data structures into an octet stream using a specific RPC transfer syntax for network transmission. The inverse of marshaling is called unmarshaling.
Stub: Code that converts parameters and results passed between client and server during a remote procedure call.

What Did I want to Learn or Do?

Document local RPC servers with their respective interfaces and methods for specific windows versions.
Document relationships between RPC methods and other functions, automate the process and also use other open source tools to analyze the results in a graph format and identify interesting code execution paths.

1. Documenting Local RPC Servers

Pre-Requirements:

A Windows 10 Box (mine was version 1909 from here)
Install Debugging Tools for Windows 10 from Windows 10 SDK
In your Windows 10 box (PowerShell console), run the following to cache symbols locally in the c:\symbols directory.

PS> cd 'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64'PS> .\symchk /s srv*c:\symbols*https://msdl.microsoft.com/download/symbols c:\windows\system32\*.dllPS> .\symchk /s srv*c:\symbols*https://msdl.microsoft.com/download/symbols c:\windows\system32\*.exe

Install and import the NtObjectManager Module by James Forshaw (Run this in a PowerShell console in your Windows 10 box)

PS> Install-Module -Name NtObjectManager
PS> Import-Module NtObjectManager

How do I use it?

Every time I read about an RPC Interface (i.e. 1ff70682–0a51–30e8–076d-740be8cee98b) in a blog post or MS docs, I get interested in what methods are exposed by the RPC server.

One of my favorite ways to get information about local RPC interfaces is with the NTObjectManager module. First, I collect every local RPC server available on every DLL and EXE in C:\Windows\System32\*

PS> $lookRPC = Get-ChildItem C:\Windows\System32\* -Include '*.dll','*.exe' | Get-RpcServer -DbgHelpPath 'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\dbghelp.dll'

Next, I filter the results and look for the specific interface (i.e. 1ff70682–0a51–30e8–076d-740be8cee98b) with the following command

PS> $lookRPC | ? {$_.InterfaceId -eq '1ff70682-0a51-30e8-076d-740be8cee98b'} | flInterfaceId           : 1ff70682-0a51-30e8-076d-740be8cee98b
InterfaceVersion      : 1.0
TransferSyntaxId      : 8a885d04-1ceb-11c9-9fe8-08002b104860
TransferSyntaxVersion : 2.0
ProcedureCount        : 4
Procedures            : {NetrJobAdd, NetrJobDel, NetrJobEnum, NetrJobGetInfo}
Server                : UUID: 1ff70682-0a51-30e8-076d-740be8cee98b
ComplexTypes          : {Struct_0, Struct_1, Struct_2}
FilePath              : C:\Windows\System32\taskcomp.dll
Name                  : taskcomp.dll
Offset                : 293472
ServiceName           :
ServiceDisplayName    :
IsServiceRunning      : False
Endpoints             : {[1ff70682-0a51-30e8-076d-740be8cee98b, 1.0] ncalrpc:[LRPC-cec0ea4f85e19dd983]}
EndpointCount         : 1
Client                : False

Finally, I get to the RPC procedures/methods with the following command:

PS> $lookRPC | ? {$_.InterfaceId -eq '1ff70682-0a51-30e8-076d-740be8cee98b'} |select -ExpandProperty ProceduresName             : NetrJobAdd
Params           : {FC_UP - NdrPointerTypeReference - MustSize, MustFree, IsIn, FC_RP - NdrPointerTypeReference - MustSize, MustFree, IsIn, FC_LONG - NdrSimpleTypeReference - IsOut, IsBasetype, IsSimpleRef}
ReturnValue      : FC_LONG - NdrSimpleTypeReference - IsOut, IsReturn, IsBasetype
Handle           : FC_BIND_GENERIC - NdrSimpleTypeReference - 0
RpcFlags         : 0
ProcNum          : 0
StackSize        : 32
HasAsyncHandle   : False
DispatchFunction : 140718573908032
InterpreterFlags : ClientMustSize, HasReturn, HasExtensions
..
...
.....

I work with a few different versions of windows in my lab environment. Therefore, I decided to start documenting every single RPC server, interface and method from different versions in this repository WinRPCFunctions.

I automated the collection of all that with this PowerShell script to export everything in a specific format (Heavy work done by NtObjectManager )

Get-RPCMetadata -DbgHelpDllPath 'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\dbghelp.dll' -OutputPath C:\Users\User\Desktop

The script also exports the paths of all the modules where it was able to parse RPC servers to a .txt file (We are going to use this later 😉)

2. Document Relationships between RPC Methods and Other Functions

Now, I was wondering, what other functions can an RPC method call in a Windows endpoint? What about doing it recursively?

RPC Method -> Function -> Function -> Function -> ?

I remembered that Adam (@_xpn_) had already worked on this idea last year 2019/08 and shared this amazing post “Analysing RPC With Ghidra and Neo4j” 🙏 with the community. Thank you Adam !

Previous Amazing Work 🍻

I wanted to understand Adam’s approach so these were my notes:

Enumerate processes currently running in a Windows endpoint
Get loaded modules for each process (Memory)
Find RPCRT4.dll and traverse its memory contents (.data section) to find the RPC_SERVER instance
Retrieve a list of interfaces from the RPC_SERVER instance.
Retrieve a list of RPC methods exposed by each RPC server interface
Collect information about the module exposing the RPC methods, group all the results by module and export a JSON file for each one.
Import all identified modules (dlls and exes) to Ghidra
Iterate over each JSON file (Modules -> RPC Interfaces -> RPC Methods), calculate the offset address for each RPC method in each JSON file and pass them as arguments to run with a python script in Ghidra.
The python script finds RPC methods in the imported modules and identifies subsequent calls made by the RPC method (recursively).
Export the additional relationships among RPC methods and other functions to CSV files and leverage the power of Neo4j to identify, for example, RPC methods which could eventually make a Win32 API call such as LoadLibraryExW(External Function).

https://blog.xpnsec.com/analysing-rpc-with-ghidra-neo4j/

I reached out to Adam for a few questions about it and he mentioned there was another blog post that showed how one could also do the identification of RPC_SERVER instances, interfaces and methods section of his research all in Ghidra via a java script and on modules on disk 😱 .

This was a blog post shared by @Sektor7Net (Reenz0h) titled “RPC_SERVER_INTERFACE parser for Ghidra” 🙏. Thank you Reenz0h!

So, How Can I Help 🤔?

Reenz0h’s java script was missing the part that Adam had written in Python to retrieve functions called by all RPC methods recursively and export the results in a graph format. Maybe I could add that?
Adam: “While exploring calls using Neo4j speeds up the analysis phase (for me at least), there is a tradeoff currently in the amount of time it takes to load our data into the database.” Maybe there is an alternative for it?

Extending Initial Java Code (Just a little bit 😉)

Integrating this part of Adam’s Python code to the java script that reenz0h had written looked pretty straight-forward, but in order for me to test it, I needed to install Ghidra and try to replicate the whole process 😱. This was a great opportunity to not only learn about the process or what the code was doing, but also how to use Ghidra. I hope my notes below help you 🙏

Install and Set Up Ghidra

Download Ghidra (Mine was 9.1.2 PUBLIC).
Check the minimum requirements and install JDK 11 (Mine was AdoptOpenJDK 11 LTS JDK) . Make sure you set JAVA_HOME.

Go to your Ghidra home directory and double click on the ghidraRun(.bat) script to launch/start Ghidra. Next, accept license.

Create a new project. I called mine Windows (File > New Project)

Find Modules to Import to Ghidra 🐉

I used the .txt file I obtained after running the script I shared earlier when I was exploring local RPC servers. All modules with local RPC functionality.

Import Modules to Ghidra ⏳

I used the command-line-based (non-GUI) version of Ghidra known as the Headless Analyzer. We can use it by running the following .bat script available in the support folder inside of the Ghidra home directory.

You can import a module to a project in Ghidra with the following command:

PS> $GHIDRA_HOME\support\analyzeHeadless.bat <GHIDRA_PROJECT_PATH> <PROJECT_NAME>.gpr -import module.dll -overwrite

I ended up using this PowerShell script to import every module in my list. (Make sure Ghidra is not running when you use the Headless Analyzer)

After several hours! ⏰ , I launched Ghidra and saw all the modules I had uploaded under my Windows project.

Ready to Test Initial Java Script ✅ ?

Click on the 🐲 (Code Browser)

On the top options, select the Display Script Manager option

You will be able to search all scripts that come pre-loaded with Ghidra and also custom scripts that you create. I used the following script to run the initial script on every module: CallAnotherScriptForAllPrograms.java

Download reenz0h’s initial java script (RPCparser.java) from here
Move the script to your $USER_HOME/ghidra_scripts folder. It is created by Ghidra. Mine was C:\users\Roberto\ghidra_scripts
Open and edit $GHIDRA_HOME\Ghidra\Features\Base\ghidra_scripts\CallAnotherScriptForAllPrograms.java by setting the SUBSCRIPT_NAME variable to RPCparser.java as shown below:

Save the changes, reload Display Script Manager in Ghidra and run CallAnotherScriptForAllPrograms.java (Double-Click on it)

Everything should work as expected! RPC Methods are found…

Extend Code 📃

Next, we need to extend the initial code and find functions that the RPC method call and export the results to a JSON file.

The file that I ended up creating was this one named FindWinRpcFunctionsMaps.java . I added/modified the following sections:

Ready to Test the Extended Script ♻️ ?

I just followed the steps I took when I tested the initial code

Edit CallAnotherScriptForAllPrograms.java to point to my new script
Reload Display script manager and run the script

This is the new output for the java script:

After processing every module (20–25 mins), it writes everything to a JSON file (AllRPCMaps.json) in your Ghidra home folder

If you want to get an idea of what each record look like, this is an example:

{
    "Module": "c:/Windows/System32/ACPBackgroundManagerPolicy.dll",
    "FunctionName": "_guard_dispatch_icall",
    "FunctionType": "IntFunction",
    "CalledByModule": "c:/Windows/System32/ACPBackgroundManagerPolicy.dll",
    "CalledBy": "MVoipSrvNotifyVoipActiveCall",
    "Address": "1800092d0"
}

I compressed the whole file and it is now also available as part of the WinRPCFunctions project. Here is the one for Windows 10 1909 build.

Analyzing Results (Structured and Graph-Like) 🔥

As I mentioned before, Adam used Neo4j to expedite the analysis of the data he obtained with his initial approach. However, it was taking so much time for him to load the data into the neo4j database.

Even though there might be a way to improve the data consumption aspect of Neo4j. I started to think if there was another practical and flexible way to ingest and analyze the data with similar graphing capabilities 🤔

Enter Jupyter Notebook Project 💥

It is the evolution of the IPython Notebook library which was developed primarily to enhance the default python interactive console by enabling scientific operations and advanced data analytics capabilities

A Notebook?

Think of a notebook as a document that you can access via a web interface that allows you to save:

Input (live code)
Output (evaluated code output)
Visualizations and narrative text (Tell the story!)

I recently started a project with my brother Jose Luis Rodriguez to build the first Infosec Community Guide to Jupyter Notebooks and share some use cases. I recommend to take a look at it and get familiarized with the use cases and the project overall. For this use case I needed graph-like capabilities 🤔

🚨 Enter Graphframes 🔗

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

A DataFrame?

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table.

https://pages.databricks.com/gentle-intro-spark.html

A Graph Inside of a DataFrame?

The idea is that one could use DataFrames as input to create a graph representation of them. All we need is one DataFrame to define the vertices and another one to describe the edges. Graphframes does the rest for you and provides several features to interact with the data in a graph-like way.

Ok, but, How Do I even install all that? That’s too much!

Deploying a Jupyter Notebook Server with Apache Spark and Graphframes via Docker 🐳

I put together the following DockerFile to deploy a Jupyter Notebook server with GraphFrames included, and it is now available for you as a Docker image. Simply download it and run it 😉 (I used my mac for everything)

If you have not installed docker in your system yet, run the following commands or install Docker desktop (I use Docker Desktop)

curl -fsSL https://get.docker.com -o get-docker.sh                       chmod +x get-docker.sh
./get-docker.sh

Make sure you allocate at least 4GB of memory to your container. I use Docker Desktop to set that up before running any containers. (Right-click on Docker Desktop > Dashboard > Preferences > Resources)

Download and Run the Docker image

docker image pull cyb3rward0g/jupyter-graphframes:0.8.0-3.0.0docker run --rm -it -p 8888:8888 -p 4040:4040 --memory="4g" cyb3rward0g/jupyter-graphframes:0.8.0-3.0.0

The Jupyter Notebook will start as shown in the image below. Copy the last URL from the output to access the server through your browser:

Paste it in your favorite browser, and you now have a Jupyter Notebook server with Apache Spark and GraphFrames ready to be used 🍻

Ingest JSON File and Create GraphFrame!

We are now ready to ingest and analyze the AllRPCMaps.json file we got after running the FindWinRpcFunctionsMaps.java script.

Create a New Notebook

On the top right side of your screen, click on New > PySpark_Python3

You will get a plain notebook which you can use to run code. You can run every notebook cell with SHIFT+ENTER.

Import Libraries

Import Spark and Graphframes libraries

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from graphframes import *

Create Spark Session

We need to create a Spark session to initialize our analytics engine.

spark = SparkSession \
    .builder \
    .appName("WinRPC") \
    .config("spark.sql.caseSensitive","True") \
    .config("spark.driver.memory", "4g") \
    .getOrCreate()

Download and Decompress JSON File

! wget https://github.com/Cyb3rWard0g/WinRpcFunctions/raw/master/metadata/win10_10.0.1909/AllRpcFuncMaps.zip!unzip AllRPCFuncMaps.zip

Read File as a DataFrame ⏳

We are now ready to read our file. I added a %%time command to show you how long it takes to ingest the file into a Spark DataFrame 🤞

%%time
df = spark.read.json('AllRpcFuncMaps.json')

Not bad! Maybe? Works for me 😆

Expose DataFrame as a SQL Table

To simplify the initial analysis and transformation of the data, I like to query my data with SQL-Like queries. Easy to understand and share with others.

df.createOrReplaceTempView('RPCMaps')

Create GraphFrame: G = (Vertices, Edges)

Next, we need to create the DataFrames needed to create a GraphFrame.

Define Vertices 🔵

vertices = spark.sql(
'''
SELECT FunctionName AS id, FunctionType, Module
FROM RPCMaps
GROUP BY FunctionName, FunctionType, Module
'''
)

Define Edges 🔗

edges = spark.sql(
'''
SELECT CalledBy AS src, FunctionName AS dst
FROM RPCMaps
'''
).dropDuplicates()

Create GraphFrame!

Finally, we create a Graphframe by using the two DataFrames we created earlier (Vertices and Edges) as input 💥

g = GraphFrame(vertices, edges)

Spark Jobs Dashboard

One tip before you continue running queries! If you browse to localhost:4040 , you will get to the UI of Apache Spark where you will be able to monitor the state of your jobs (queries) and executors.

If something is taking longer than you would expect, you can check how many tasks are left to execute and how many resources your query is taking.

Ready to Analyze GraphFrames? 🚀

If you want to learn more about all the capabilities and features that GraphFrames provides, I highly recommend to read this document. It goes from basic graph queries to graph algorithms!

One of my favorite features is Motif Finding 🍻

What is Motif Finding?

Motif finding refers to searching for structural patterns in a graph.
GraphFrame motif finding uses a simple Domain-Specific Language (DSL) for expressing structural queries.
For example, graph.find("(a)-[e]->(b); (b)-[e2]->(a)") will search for pairs of vertices a,b connected by edges in both directions. It will return a DataFrame of all such structures in the graph, with columns for each of the named elements (vertices or edges) in the motif.

Motif Find a chain of 3 vertices

What about a chain of 3 vertices where the first one is an RPC function and the last one is an external function named LoadLibraryExW?

Loads the specified module into the address space of the calling process. The specified module may cause other modules to be loaded.

loadLibrary = g.find("(a)-[]->(b); (b)-[]->(c)")\
  .filter("a.FunctionType = 'RPCFunction'")\
  .filter("c.FunctionType = 'ExtFunction'")\
  .filter("c.id = 'LoadLibraryExW'").dropDuplicates()

Run graph query and show the first 10 records:

%%time
loadLibrary.select("a.Module","a.id","b.id","c.id").show(10,truncate=False)

You can check the stages being executed remember?

After a few seconds or minutes, you will get a similar output in a table format.

What if we can also filter our graph query by a specific module? What about Lsasrv.dll? An authentication component for all systems

The LSA Server service, which both enforces security policies and acts as the security package manager for the LSA

loadLibrary = g.find("(a)-[]->(b); (b)-[]->(c)")\
  .filter("a.FunctionType = 'RPCFunction'")\
  .filter("lower(a.Module) LIKE '%lsasrv.dll'")\
  .filter("c.FunctionType = 'ExtFunction'")\
  .filter("c.id = 'LoadLibraryExW'").dropDuplicates()

Run query and show the first 10 records

%%time
loadLibrary.select("a.Module","a.id","b.id","c.id").show(10,truncate=False)

Interesting results that make you think how one could potentially use RPC mechanisms to connect to the RPC server on lsasrv.dll and potentially run methods that could eventually call LoadLibraryExW (Matt Graeber Idea).

You can validate the results by going to Ghidra and looking at lsasrv.dll “function call trees” as shown below. I looked for a non-obvious path:

LsarCreateSecret -> LsapDbDereferenceObject -> LoadLibraryExW

You also might be asking yourself:

“What about looking for the shortest path between one vertex to another one instead of specifying the number of hops in the chain?”

Breadth-first search (BFS)

GraphFrames provide several graph algorithms that you could use to perform additional analysis. You can learn more about them here.

Breadth-first search (BFS) finds the shortest path(s) from one vertex (or a set of vertices) to another vertex (or a set of vertices). The beginning and end vertices are specified as Spark DataFrame expressions.

Let’s use our initial example:

loadLibraryBFS = g.bfs(
  fromExpr = "FunctionType = 'RPCFunction'",
  toExpr = "id = 'LoadLibraryExW' and FunctionType = 'ExtFunction'",
  maxPathLength = 3).dropDuplicates()

Run BFS graph algorithm and show the first 10 records

%%time
loadLibraryBFS.select("from.Module", "e0").show(10,truncate=False)

As you can see, the Graphrames library is a pretty powerful one leveraging Apache Spark Dataframes and over a light docker container running a Jupyter Notebook server. It has several features that might help you while performing research and querying data in a graph format.

I hope you enjoyed this post! I still have a lot to learn about RPC and Ghidra, but this helped me a lot to get motivated and try it 😃. This was my first time playing with Ghidra and NtObjectManager, and I am glad I was able to learn a little bit about the installation and automation aspects of them.

Finally, I wanted to thank Adam Chester, Ruben, Reenz0h and Matt Graeber for their patience and help answering questions regarding some of the concepts presented in this post. Thank you all so much for your time and contributions to the community! 🙏

WinRPCFunctions GitHub Repo:

https://github.com/Cyb3rWard0g/WinRpcFunctions

Notebook was already uploaded to the InfoSec Jupyter Book project:

https://infosecjupyterbook.com/use-cases/data-analysis/03_analyzing_rpc_methods_relationships_graphframes.html