Enhancing Cyber Threat Intelligence with TI Mindmap GPT: Integration of Azure OpenAI and advanced features

Multi-Language Support, IOC Extraction, and BYOK Model Integration in TI Mindmap GPT

Antonio Formato
Microsoft Azure
7 min readDec 18, 2023

--

Over the past few months, I have been deeply exploring the potential of Generative AI to support Infosec Professionals across various scenarios. My focus has been on how Azure OpenAI-based applications can meet specific demands in cyber threat intelligence. My recent development is the TI Mindmap GPT tool (previous article), tailored to aid cyber threat intelligence teams in efficiently collating and visualizing essential data from diverse sources. This application adopts a ‘Bring Your Own (OpenAI) Key’ framework, enabling users to apply their OpenAI/Azure OpenAI keys. This application works by accepting URLs of blog posts, threat intelligence articles, or write-ups. Upon entering these into the application, it leverages OpenAI/Azure OpenAI’s capabilities to analyze and succinctly summarize the content. Then it converts these summaries into Mermaid code to creates a mindmap that graphically interlinks the various entities, themes, and ideas covered in the content, offering a comprehensive visual interpretation.

In the past two weeks, I have expanded the scope by releasing support for Azure OpenAI, the capability to extract IOCs, and the translation of summaries from infosec write-ups.

StreamlitApp: https://ti-mindmap-gpt.streamlit.app/

GutHub Repo: https://github.com/format81/TI-Mindmap-GPT

New features powered by Azure OpenAI

Azure OpenAI Support and BYOK Model

The integration between Azure OpenAI and TI Mindmap GPT is achieved through the BYOK (Bring Your Own Key) model. This model further enhances data security, allowing users to maintain control over their encryption keys and ensuring data confidentiality.

Why use Azure OpenAI and what are the differences from OpenAI? In summary, Microsoft’s documentation states: “With Azure OpenAI, customers get the security capabilities of Microsoft Azure while running the same models as OpenAI. Azure OpenAI offers private networking, regional availability, and responsible AI content filtering.”

By selecting “Azure OpenAI,” you will be prompted to fill in the following fields:

  • Azure OpenAI API key: You can find your Azure OpenAI API key on the Azure portal.
  • Azure OpenAI endpoint
  • Azure OpenAI deployment name
TI Mindmap GPT — Azure OpenAI variables

All the OpenAI keys and endpoint can be found within your Azure OpenAI instance, under “Keys and Endpoint”.

Azure OpenAI key and Endpoint

Deployment name can be found within your Azure OpenAI Studio portal.

Azure OpenAI deployment name

Translation of Cybersecurity Write-ups

Cybersecurity write-ups are comprehensive reports that chronicle the procedures employed to identify, investigate, and remediate security incidents. They serve as a valuable resource for security professionals, providing insights into the latest attack methodologies and vulnerabilities. Nevertheless, a significant portion of cybersecurity write-ups are authored in English, which can restrict their accessibility to a global audience of security professionals. Translating cybersecurity write-ups can effectively circumvent this constraint, making this crucial information more widely available. By translating these write-ups into multiple languages, we can guarantee that security professionals worldwide have access to the knowledge they need to safeguard their organizations from cyberattacks.

To address this need, I have integrated a function into the TI Mindmap GPT Python code that leverages Azure OpenAI to translate the recap of the article, blog post, or write-up into the user’s preferred language. This enhancement ensures that security professionals can access critical threat intelligence regardless of their native language.

You can find this function in the GitHub repository: https://github.com/format81/TI-Mindmap-GPT

def summarise(input_text, client, service_selection, selected_language):
# Combine the selected languages into a string, or default to "English" if none selected
language = ", ".join(selected_language) if selected_language else "English"
if service_selection == "OpenAI":
# OpenAI API call
response = client.chat.completions.create(
model="gpt-4-1106-preview",
messages=[
{
"role": "system",
"content": f"You are responsible for summarizing in {language} a threat report for a Threat Analyst. Write a paragraph that will summarize the main topic, the key findings, and all the detailed information relevant for a threat analyst such as detection opportunity iocs and TTPs. Use the title and add an emoji. Do not generate a bullet points list but rather multiple paragraphs."
},
{"role": "user", "content": input_text},
],
)
return response.choices[0].message.content
elif service_selection == "Azure OpenAI":
# Azure OpenAI API call
response = client.chat.completions.create(
model = deployment_name,
messages=[
{
"role": "system",
"content": f"You are responsible for summarizing in {language} a threat report for a Threat Analyst. Write a paragraph that will summarize the main topic, the key findings, and all the detailed information relevant for a threat analyst such as detection opportunity iocs and TTPs. Use the title and add an emoji. Do not generate a bullet points list but rather multiple paragraphs."
},
{"role": "user", "content": input_text},
],
)
return response.choices[0].message.content
Write-up recap translation in italian

The previous translation example comes from the analysis of the report: Operation Blacksmith: Lazarus targets organizations worldwide using novel Telegram-based malware written in DLang (talosintelligence.com).

Extraction of IOCs in Table Format

The ability of LLMs to process vast amounts of text data, identify patterns, and extract relevant information makes them well-suited for identifying and extracting Indicators of Compromise (IOCs) from various sources, including blog posts, technical reports, and even social media feeds.

LLMs can effectively classify and label IOCs,categorizing them based on their type, such as IP addresses, URLs, domain names, or file hash values. This automated labeling process significantly reduces the time and effort required for manual classification, ensuring that IOCs are correctly identified and categorized for further analysis.

In response to a requirement for the TI Mindmap GPT application to extract IOCs (Indicators of Compromise) from write-ups, I utilized Azure OpenAI to accomplish this objective, implementing the extract_iocs Python function.

Extracted IOCs
def extract_iocs(input_text, client, service_selection):
prompt = "You are tasked with extracting IOCs from the following blog post for a threat analyst. Provide a structured, table-like format, with rows separated by newlines and columns by commas with the following rows: Indicator, Type, Description. Extract indicators just if you are able to find them in the blog post provided. With reference to IP addresses, URLs, and domains, remove square brackets. Examples: tech[.]micrsofts[.]com will be tech.micrsofts.com and 27.102.113.93\n\n" + input_text
if service_selection == "OpenAI":
# OpenAI API call
response = client.chat.completions.create(
model="gpt-4-1106-preview",
messages=[
{"role": "system", "content": prompt}
],
)
elif service_selection == "Azure OpenAI":
# Azure OpenAI API call
response = client.chat.completions.create(
model=deployment_name,
messages=[
{"role": "system", "content": prompt}
],
)
else:
return "Service selection is invalid."

# Extract and return the response content
try:
response_content = response.choices[0].message.content
# Parse the response content into a DataFrame
data = [line.split(",") for line in response_content.strip().split("\n")]
df = pd.DataFrame(data[1:], columns=data[0])
return df
except Exception as e:
return f"Failed to extract and parse IOCs: {e}"

Review of content relevance.

To guarantee that the application exclusively handles content pertinent to its intended applications, I have devised a function to authenticate whether the text gleaned from the input web page pertains to the cybersecurity domain.

Large language models (LLMs) have demonstrated remarkable capabilities for labeling and classifying content; following is the example implemented with Azure OpenAI.

Review of content relevance example
def check_content_relevance(input_text, client, service_selection):
prompt = "Determine if the following text is related to cybersecurity: \n" + input_text
if service_selection == "OpenAI":
# OpenAI API call
response = client.chat.completions.create(
model="gpt-4-1106-preview",
messages=[{"role": "system", "content": prompt}]
)
return response.choices[0].message.content
elif service_selection == "Azure OpenAI":
# Azure OpenAI API call
response = client.chat.completions.create(
model = deployment_name,
messages=[{"role": "system", "content": prompt}]
)
return response.choices[0].message.content

Let’s not forget the Mindmap, which is the primary raison d’être of this app. ☺️

Mindmap of Operation Blacksmith: Lazarus targets organizations worldwide using novel Telegram-based malware written in DLang (talosintelligence.com)

TI Mindmap GPT video demo

TI Mindmap GPT demo

I hope you find this interesting. If you’ve found this app useful, I invite you to contribute and add a star on GitHub and follow me on Linkedin.

Enjoy TI Mindmap GPT https://ti-mindmap-gpt.streamlit.app/

☺️

Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.

--

--