Extending the Exploration and Analysis of Windows RPC Methods Calling other Functions with Ghidra π, Jupyter Notebooks π and Graphframes π!
A few weeks ago, I was going over some of the research topics in my to-do list, and the one that sounded interesting to work on during 4th of July weekend π was the documentation and exploration of relationships between RPC procedures/methods and other external functions such as Win32 APIs in a Windows endpoint. Either you are doing offensive or defensive research, this type of research not only can help you to uncover alternative ways to execute code, but also provide additional context to some of the known techniques out there that leverage specific functions to perform an action.
In this post I will show you how I extended previous amazing work by Adam (@_xpn_) and @Sektor7Net (Reenz0h) to enumerate and export relationships between RPC methods and other functions with the help of Ghidra and how I used Jupyter Notebooks and Graphframes to consume the results in a graph format and search for structural patterns in it.
What is RPC?
RPC is an inter-process communication (IPC) mechanism that enables data exchange and the invocation of functionality that resides in a different process. The different process can be on the same machine, on the local area network (LAN), or across the Internet
A Client/Server Execution Model
According to OSFβs Distributed Computing Environment (DCE) 1.1, the RPC model makes a functional distinction between clients and servers. A client requests a service, and a server provides the service by making resources available to the remote client.
The Microsoft RPC (MRPC) Model
It is an extension of the OSF DCE RPC standard and this is a basic architecture overview I put together after reading some of the concepts behind it:
Reading about the following concepts helped me to get familiarized with the architecture shown above and understand the output of tools such as NTObjectManager by James Forshaw:
- Interfaces: An interface is a set of remotely callable operations offered by a server and invoked/requested by clients.
- Interface UUID: An interface universally unique identifier (UUID) that identifies the interface to which the called operation belongs.
- Opnum: An operation number or numeric identifier that is used to identify an specific operation within the interface.
- Procedures: Callable operations also known as methods available via an interface, offered by a server and invoked/requested by clients.
- RPC protocol sequence: A character string that represents a valid combination of a RPC protocol, a network layer protocol, and a transport layer protocol (i.e ncacn_ip_tcp)
- Endpoints: Depending on the RPC protocol sequence being used, an endpoint could be a port number, a named pipe or simply a name. When client and server are not in the same machine, the server listens on a port or group of ports. These port number are called endpoints in RPC.
- Endpoint Mapper: An RPC service that manages endpoints.
- RPC run-time system: A library of routines and a set of services that handle the network communications that underlie the RPC mechanism.
- Binding: Establishment of a relationship between a client and a server that permits the client to make a remote procedure call to the server.
- Marshal: To encode one or more data structures into an octet stream using a specific RPC transfer syntax for network transmission. The inverse of marshaling is called unmarshaling.
- Stub: Code that converts parameters and results passed between client and server during a remote procedure call.
What Did I want to Learn or Do?
- Document local RPC servers with their respective interfaces and methods for specific windows versions.
- Document relationships between RPC methods and other functions, automate the process and also use other open source tools to analyze the results in a graph format and identify interesting code execution paths.
1. Documenting Local RPC Servers
Pre-Requirements:
- A Windows 10 Box (mine was version 1909 from here)
- Install Debugging Tools for Windows 10 from Windows 10 SDK
- In your Windows 10 box (PowerShell console), run the following to cache symbols locally in the c:\symbols directory.
PS> cd 'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64'PS> .\symchk /s srv*c:\symbols*https://msdl.microsoft.com/download/symbols c:\windows\system32\*.dllPS> .\symchk /s srv*c:\symbols*https://msdl.microsoft.com/download/symbols c:\windows\system32\*.exe
- Install and import the NtObjectManager Module by James Forshaw (Run this in a PowerShell console in your Windows 10 box)
PS> Install-Module -Name NtObjectManager
PS> Import-Module NtObjectManager
How do I use it?
Every time I read about an RPC Interface (i.e. 1ff70682β0a51β30e8β076d-740be8cee98b
) in a blog post or MS docs, I get interested in what methods are exposed by the RPC server.
One of my favorite ways to get information about local RPC interfaces is with the NTObjectManager module. First, I collect every local RPC server available on every DLL and EXE in C:\Windows\System32\*
PS> $lookRPC = Get-ChildItem C:\Windows\System32\* -Include '*.dll','*.exe' | Get-RpcServer -DbgHelpPath 'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\dbghelp.dll'
Next, I filter the results and look for the specific interface (i.e. 1ff70682β0a51β30e8β076d-740be8cee98b
) with the following command
PS> $lookRPC | ? {$_.InterfaceId -eq '1ff70682-0a51-30e8-076d-740be8cee98b'} | flInterfaceId : 1ff70682-0a51-30e8-076d-740be8cee98b
InterfaceVersion : 1.0
TransferSyntaxId : 8a885d04-1ceb-11c9-9fe8-08002b104860
TransferSyntaxVersion : 2.0
ProcedureCount : 4
Procedures : {NetrJobAdd, NetrJobDel, NetrJobEnum, NetrJobGetInfo}
Server : UUID: 1ff70682-0a51-30e8-076d-740be8cee98b
ComplexTypes : {Struct_0, Struct_1, Struct_2}
FilePath : C:\Windows\System32\taskcomp.dll
Name : taskcomp.dll
Offset : 293472
ServiceName :
ServiceDisplayName :
IsServiceRunning : False
Endpoints : {[1ff70682-0a51-30e8-076d-740be8cee98b, 1.0] ncalrpc:[LRPC-cec0ea4f85e19dd983]}
EndpointCount : 1
Client : False
Finally, I get to the RPC procedures/methods with the following command:
PS> $lookRPC | ? {$_.InterfaceId -eq '1ff70682-0a51-30e8-076d-740be8cee98b'} |select -ExpandProperty ProceduresName : NetrJobAdd
Params : {FC_UP - NdrPointerTypeReference - MustSize, MustFree, IsIn, FC_RP - NdrPointerTypeReference - MustSize, MustFree, IsIn, FC_LONG - NdrSimpleTypeReference - IsOut, IsBasetype, IsSimpleRef}
ReturnValue : FC_LONG - NdrSimpleTypeReference - IsOut, IsReturn, IsBasetype
Handle : FC_BIND_GENERIC - NdrSimpleTypeReference - 0
RpcFlags : 0
ProcNum : 0
StackSize : 32
HasAsyncHandle : False
DispatchFunction : 140718573908032
InterpreterFlags : ClientMustSize, HasReturn, HasExtensions
..
...
.....
I work with a few different versions of windows in my lab environment. Therefore, I decided to start documenting every single RPC server, interface and method from different versions in this repository WinRPCFunctions.
I automated the collection of all that with this PowerShell script to export everything in a specific format (Heavy work done by NtObjectManager
)
Get-RPCMetadata -DbgHelpDllPath 'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\dbghelp.dll' -OutputPath C:\Users\User\Desktop
The script also exports the paths of all the modules where it was able to parse RPC servers to a .txt file (We are going to use this later π)
2. Document Relationships between RPC Methods and Other Functions
Now, I was wondering, what other functions can an RPC method call in a Windows endpoint? What about doing it recursively?
RPC Method -> Function -> Function -> Function -> ?
I remembered that Adam (@_xpn_) had already worked on this idea last year 2019/08 and shared this amazing post βAnalysing RPC With Ghidra and Neo4jβ π with the community. Thank you Adam !
Previous Amazing Work π»
I wanted to understand Adamβs approach so these were my notes:
- Enumerate processes currently running in a Windows endpoint
- Get loaded modules for each process (Memory)
- Find RPCRT4.dll and traverse its memory contents (.data section) to find the
RPC_SERVER
instance - Retrieve a list of interfaces from the
RPC_SERVER
instance. - Retrieve a list of RPC methods exposed by each RPC server interface
- Collect information about the module exposing the RPC methods, group all the results by module and export a JSON file for each one.
- Import all identified modules (dlls and exes) to Ghidra
- Iterate over each JSON file (Modules -> RPC Interfaces -> RPC Methods), calculate the
offset
address for each RPC method in each JSON file and pass them as arguments to run with a python script in Ghidra. - The python script finds RPC methods in the imported modules and identifies subsequent calls made by the RPC method (recursively).
- Export the additional relationships among RPC methods and other functions to CSV files and leverage the power of Neo4j to identify, for example, RPC methods which could eventually make a Win32 API call such as
LoadLibraryExW
(External Function).
I reached out to Adam for a few questions about it and he mentioned there was another blog post that showed how one could also do the identification of RPC_SERVER
instances, interfaces and methods section of his research all in Ghidra via a java script and on modules on disk π± .
This was a blog post shared by @Sektor7Net (Reenz0h) titled βRPC_SERVER_INTERFACE parser for Ghidraβ π. Thank you Reenz0h!
So, How Can I Help π€?
- Reenz0hβs java script was missing the part that Adam had written in Python to retrieve functions called by all RPC methods recursively and export the results in a graph format. Maybe I could add that?
- Adam: βWhile exploring calls using Neo4j speeds up the analysis phase (for me at least), there is a tradeoff currently in the amount of time it takes to load our data into the database.β Maybe there is an alternative for it?
Extending Initial Java Code (Just a little bit π)
Integrating this part of Adamβs Python code to the java script that reenz0h had written looked pretty straight-forward, but in order for me to test it, I needed to install Ghidra and try to replicate the whole process π±. This was a great opportunity to not only learn about the process or what the code was doing, but also how to use Ghidra. I hope my notes below help you π
Install and Set Up Ghidra
- Download Ghidra (Mine was 9.1.2 PUBLIC).
- Check the minimum requirements and install JDK 11 (Mine was AdoptOpenJDK 11 LTS JDK) . Make sure you set JAVA_HOME.
- Go to your Ghidra home directory and double click on the ghidraRun(.bat) script to launch/start Ghidra. Next, accept license.
- Create a new project. I called mine
Windows
(File > New Project)
Find Modules to Import to Ghidra π
I used the .txt file I obtained after running the script I shared earlier when I was exploring local RPC servers. All modules with local RPC functionality.
Import Modules to Ghidra β³
I used the command-line-based (non-GUI) version of Ghidra known as the Headless Analyzer. We can use it by running the following .bat script available in the support
folder inside of the Ghidra home directory.
You can import a module to a project in Ghidra with the following command:
PS> $GHIDRA_HOME\support\analyzeHeadless.bat <GHIDRA_PROJECT_PATH> <PROJECT_NAME>.gpr -import module.dll -overwrite
I ended up using this PowerShell script to import every module in my list. (Make sure Ghidra is not running when you use the Headless Analyzer)
After several hours! β° , I launched Ghidra and saw all the modules I had uploaded under my Windows
project.
Ready to Test Initial Java Script β ?
- Click on the π² (Code Browser)
- On the top options, select the
Display Script Manager
option
- You will be able to search all scripts that come pre-loaded with Ghidra and also custom scripts that you create. I used the following script to run the initial script on every module:
CallAnotherScriptForAllPrograms.java
- Download reenz0hβs initial java script
(RPCparser.java
) from here - Move the script to your
$USER_HOME/ghidra_scripts
folder. It is created by Ghidra. Mine wasC:\users\Roberto\ghidra_scripts
- Open and edit
$GHIDRA_HOME\Ghidra\Features\Base\ghidra_scripts\CallAnotherScriptForAllPrograms.java
by setting theSUBSCRIPT_NAME
variable toRPCparser.java
as shown below:
- Save the changes, reload
Display Script Manager
in Ghidra and runCallAnotherScriptForAllPrograms.java
(Double-Click on it)
Everything should work as expected! RPC Methods are foundβ¦
Extend Code π
Next, we need to extend the initial code and find functions that the RPC method call and export the results to a JSON file.
The file that I ended up creating was this one named FindWinRpcFunctionsMaps.java
. I added/modified the following sections:
Ready to Test the Extended Script β»οΈ ?
I just followed the steps I took when I tested the initial code
- Edit
CallAnotherScriptForAllPrograms.java
to point to my new script - Reload
Display script manager
and run the script
- This is the new output for the java script:
- After processing every module (20β25 mins), it writes everything to a JSON file (
AllRPCMaps.json
) in your Ghidra home folder
If you want to get an idea of what each record look like, this is an example:
{
"Module": "c:/Windows/System32/ACPBackgroundManagerPolicy.dll",
"FunctionName": "_guard_dispatch_icall",
"FunctionType": "IntFunction",
"CalledByModule": "c:/Windows/System32/ACPBackgroundManagerPolicy.dll",
"CalledBy": "MVoipSrvNotifyVoipActiveCall",
"Address": "1800092d0"
}
I compressed the whole file and it is now also available as part of the WinRPCFunctions project. Here is the one for Windows 10 1909 build.
Analyzing Results (Structured and Graph-Like) π₯
As I mentioned before, Adam used Neo4j to expedite the analysis of the data he obtained with his initial approach. However, it was taking so much time for him to load the data into the neo4j database.
Even though there might be a way to improve the data consumption aspect of Neo4j. I started to think if there was another practical and flexible way to ingest and analyze the data with similar graphing capabilities π€
Enter Jupyter Notebook Project π₯
It is the evolution of the IPython Notebook library which was developed primarily to enhance the default python interactive console by enabling scientific operations and advanced data analytics capabilities
A Notebook?
Think of a notebook as a document that you can access via a web interface that allows you to save:
- Input (live code)
- Output (evaluated code output)
- Visualizations and narrative text (Tell the story!)
I recently started a project with my brother Jose Luis Rodriguez to build the first Infosec Community Guide to Jupyter Notebooks and share some use cases. I recommend to take a look at it and get familiarized with the use cases and the project overall. For this use case I needed graph-like capabilities π€
π¨ Enter Graphframes π
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.
A DataFrame?
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table.
A Graph Inside of a DataFrame?
The idea is that one could use DataFrames as input to create a graph representation of them. All we need is one DataFrame to define the vertices and another one to describe the edges. Graphframes does the rest for you and provides several features to interact with the data in a graph-like way.
Ok, but, How Do I even install all that? Thatβs too much!
Deploying a Jupyter Notebook Server with Apache Spark and Graphframes via Docker π³
I put together the following DockerFile to deploy a Jupyter Notebook server with GraphFrames included, and it is now available for you as a Docker image. Simply download it and run it π (I used my mac for everything)
- If you have not installed docker in your system yet, run the following commands or install Docker desktop (I use Docker Desktop)
curl -fsSL https://get.docker.com -o get-docker.sh chmod +x get-docker.sh
./get-docker.sh
- Make sure you allocate at least 4GB of memory to your container. I use Docker Desktop to set that up before running any containers. (Right-click on Docker Desktop > Dashboard > Preferences > Resources)
- Download and Run the Docker image
docker image pull cyb3rward0g/jupyter-graphframes:0.8.0-3.0.0docker run --rm -it -p 8888:8888 -p 4040:4040 --memory="4g" cyb3rward0g/jupyter-graphframes:0.8.0-3.0.0
- The Jupyter Notebook will start as shown in the image below. Copy the last URL from the output to access the server through your browser:
- Paste it in your favorite browser, and you now have a Jupyter Notebook server with Apache Spark and GraphFrames ready to be used π»
Ingest JSON File and Create GraphFrame!
We are now ready to ingest and analyze the AllRPCMaps.json
file we got after running the FindWinRpcFunctionsMaps.java script.
Create a New Notebook
On the top right side of your screen, click on New > PySpark_Python3
You will get a plain notebook which you can use to run code. You can run every notebook cell with SHIFT+ENTER.
Import Libraries
Import Spark and Graphframes libraries
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from graphframes import *
Create Spark Session
We need to create a Spark session to initialize our analytics engine.
spark = SparkSession \
.builder \
.appName("WinRPC") \
.config("spark.sql.caseSensitive","True") \
.config("spark.driver.memory", "4g") \
.getOrCreate()
Download and Decompress JSON File
! wget https://github.com/Cyb3rWard0g/WinRpcFunctions/raw/master/metadata/win10_10.0.1909/AllRpcFuncMaps.zip!unzip AllRPCFuncMaps.zip
Read File as a DataFrame β³
We are now ready to read our file. I added a %%time
command to show you how long it takes to ingest the file into a Spark DataFrame π€
%%time
df = spark.read.json('AllRpcFuncMaps.json')
Not bad! Maybe? Works for me π
Expose DataFrame as a SQL Table
To simplify the initial analysis and transformation of the data, I like to query my data with SQL-Like queries. Easy to understand and share with others.
df.createOrReplaceTempView('RPCMaps')
Create GraphFrame: G = (Vertices, Edges)
Next, we need to create the DataFrames needed to create a GraphFrame.
Define Vertices π΅
vertices = spark.sql(
'''
SELECT FunctionName AS id, FunctionType, Module
FROM RPCMaps
GROUP BY FunctionName, FunctionType, Module
'''
)
Define Edges π
edges = spark.sql(
'''
SELECT CalledBy AS src, FunctionName AS dst
FROM RPCMaps
'''
).dropDuplicates()
Create GraphFrame!
Finally, we create a Graphframe by using the two DataFrames we created earlier (Vertices and Edges) as input π₯
g = GraphFrame(vertices, edges)
Spark Jobs Dashboard
One tip before you continue running queries! If you browse to localhost:4040
, you will get to the UI of Apache Spark where you will be able to monitor the state of your jobs (queries) and executors.
If something is taking longer than you would expect, you can check how many tasks are left to execute and how many resources your query is taking.
Ready to Analyze GraphFrames? π
If you want to learn more about all the capabilities and features that GraphFrames provides, I highly recommend to read this document. It goes from basic graph queries to graph algorithms!
One of my favorite features is Motif Finding π»
What is Motif Finding?
- Motif finding refers to searching for structural patterns in a graph.
- GraphFrame motif finding uses a simple Domain-Specific Language (DSL) for expressing structural queries.
- For example,
graph.find("(a)-[e]->(b); (b)-[e2]->(a)")
will search for pairs of verticesa,b
connected by edges in both directions. It will return aDataFrame
of all such structures in the graph, with columns for each of the named elements (vertices or edges) in the motif.
Motif Find a chain of 3 vertices
What about a chain of 3 vertices where the first one is an RPC function and the last one is an external function named LoadLibraryExW?
Loads the specified module into the address space of the calling process. The specified module may cause other modules to be loaded.
loadLibrary = g.find("(a)-[]->(b); (b)-[]->(c)")\
.filter("a.FunctionType = 'RPCFunction'")\
.filter("c.FunctionType = 'ExtFunction'")\
.filter("c.id = 'LoadLibraryExW'").dropDuplicates()
Run graph query and show the first 10 records:
%%time
loadLibrary.select("a.Module","a.id","b.id","c.id").show(10,truncate=False)
You can check the stages being executed remember?
After a few seconds or minutes, you will get a similar output in a table format.
What if we can also filter our graph query by a specific module? What about Lsasrv.dll? An authentication component for all systems
The LSA Server service, which both enforces security policies and acts as the security package manager for the LSA
loadLibrary = g.find("(a)-[]->(b); (b)-[]->(c)")\
.filter("a.FunctionType = 'RPCFunction'")\
.filter("lower(a.Module) LIKE '%lsasrv.dll'")\
.filter("c.FunctionType = 'ExtFunction'")\
.filter("c.id = 'LoadLibraryExW'").dropDuplicates()
Run query and show the first 10 records
%%time
loadLibrary.select("a.Module","a.id","b.id","c.id").show(10,truncate=False)
Interesting results that make you think how one could potentially use RPC mechanisms to connect to the RPC server on lsasrv.dll
and potentially run methods that could eventually call LoadLibraryExW
(Matt Graeber Idea).
You can validate the results by going to Ghidra and looking at lsasrv.dll
βfunction call treesβ as shown below. I looked for a non-obvious path:
LsarCreateSecret -> LsapDbDereferenceObject -> LoadLibraryExW
You also might be asking yourself:
βWhat about looking for the shortest path between one vertex to another one instead of specifying the number of hops in the chain?β
Breadth-first search (BFS)
GraphFrames provide several graph algorithms that you could use to perform additional analysis. You can learn more about them here.
Breadth-first search (BFS) finds the shortest path(s) from one vertex (or a set of vertices) to another vertex (or a set of vertices). The beginning and end vertices are specified as Spark DataFrame expressions.
Letβs use our initial example:
loadLibraryBFS = g.bfs(
fromExpr = "FunctionType = 'RPCFunction'",
toExpr = "id = 'LoadLibraryExW' and FunctionType = 'ExtFunction'",
maxPathLength = 3).dropDuplicates()
Run BFS graph algorithm and show the first 10 records
%%time
loadLibraryBFS.select("from.Module", "e0").show(10,truncate=False)
As you can see, the Graphrames library is a pretty powerful one leveraging Apache Spark Dataframes and over a light docker container running a Jupyter Notebook server. It has several features that might help you while performing research and querying data in a graph format.
I hope you enjoyed this post! I still have a lot to learn about RPC and Ghidra, but this helped me a lot to get motivated and try it π. This was my first time playing with Ghidra and NtObjectManager, and I am glad I was able to learn a little bit about the installation and automation aspects of them.
Finally, I wanted to thank Adam Chester, Ruben, Reenz0h and Matt Graeber for their patience and help answering questions regarding some of the concepts presented in this post. Thank you all so much for your time and contributions to the community! π
WinRPCFunctions GitHub Repo:
https://github.com/Cyb3rWard0g/WinRpcFunctions
Notebook was already uploaded to the InfoSec Jupyter Book project:
References
https://blog.xpnsec.com/analysing-rpc-with-ghidra-neo4j/
http://www.powerofcommunity.net/poc2019/James.pdf
https://www.youtube.com/watch?v=2GJf8Hrxm4k
https://winprotocoldoc.blob.core.windows.net/productionwindowsarchives/MS-RPCE/%5BMS-RPCE%5D.pdf
https://docs.microsoft.com/en-us/windows/win32/rpc/the-programming-model
http://slideplayer.com/slide/6444771/
https://www.cs.rutgers.edu/~pxk/417/notes/03-rpc.html
https://en.wikipedia.org/wiki/Marshalling_(computer_science)
https://www.blackhat.com/presentations/win-usa-04/bh-win-04-seki-up2.pdf
https://googleprojectzero.blogspot.com/2019/12/calling-local-windows-rpc-servers-from.html
https://graphframes.github.io/graphframes/docs/_site/user-guide.html#motif-finding