Writing Your First Pentesting Tool Using LLM [Part 1]: A Basic Port Scanner in Python

Ishtiaque Foysol
8 min readJun 15, 2024

--

An astronaut is working with a server in a space station. Created by: Dall-e 3

A Humble Note to the Readers: This write up is one of the hands-on parts of an online course titled Python Scripting for Geeky Testers. The course once taken by the author is now in limbo due to various factors like ‘work, life and interest balance’. So, the author decided to make the class lectures and codes public for testing enthusiasts with a view to getting invaluable feedback from them to improve his own scripting skills.

In this module, we’ll rewrite the port scanner script we created on the introductory tutorial and introduce some improvements to its functionality and output results.

While the original script works as intended, we can modify it to provide more informative results and offer flexibility.

  • Variable Naming: We’ll begin by assigning descriptive names to variables. This improves code readability and maintainability.
  • Context Managers: To ensure proper resource management, we’ll use context managers for handling socket connections. This eliminates the need for manual socket closure, preventing potential errors.
  • More Explicit Results with a Port Dictionary: To provide more meaningful output, we’ll create a dictionary that maps commonly used ports to their corresponding services. Building such a dictionary manually through web searches can be time-consuming. Here’s where Large Language Models (LLMs) come into play. By interacting with LLM chatbots, we can automate the process of generating this dictionary, streamlining our script development and focusing on core functionality.
  • Add Command-Line Options
  • Add a Banner Grabber

Changing variable and function names into more readable one first.

#!/usr/bin/env python3 # ... 1

import socket # ... 2

# The source code of socket module is here
#print(socket.__file__)

PORTS = {
'ftp': 21,
'ssh': 22,
'http': 80,
'https': 443,
'upnp': 1900,
'domain': 53
} # ... 3

def scan_known_ports(ip: str, service_name: str): # ... 4
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as my_socket:
status = my_socket.connect_ex((ip, PORTS[service_name]))
if status == 0: # ... 5
print(f'\t[+] Port: {PORTS[service_name]} runs {service_name} service')

IP = input('IP address here: ') # ... 6

print(f'Scanning {IP} for open ports...') # ... 7

for service_name in PORTS: # ... 8
scan_known_ports(IP, service_name)

Now we will add a new functionality for scanning all the ports of a target.

Before that, have a look at PORTS dictionary and scan_known_ports function. We used service names as keys and ports as values in it. Please closely notice an ambiguity here.

A port scanner scans an IP address along with a Port number. Our scan_known_ports() function is scanning an IP address against a service name. Hmmm … this might not seem an issue but it will create ambiguity in the long run when the script will grow gradually and become more complex. Now changing the values by hand needs a good time and efforts that adds a little value to the outcome. Here comes our LLM assistant. Let’s assign the guy some tasks.

Note: This is a very basic example for the sake of this tutorial. LLM assistants comes in handy during complex situations when we feed them fractions of tasks, rinse the output and finally add them to our solution.

Here are some optimisations we are going to make

  • Feed your LLM assistant with a URL and make a larger dictionary
  • Keep the dictionary into a ports.py file. This will make the code base more structured and readable.
  • We’ll use port numbers as keys and service names as the values of the respective port numbers
Lets make our LLM friend understand a simple task of generating a Python dictionary

After some googling, I found a moderately good number of common ports and their services in the following blogpost: https://granneman.com/tech/networking/commonports

Read the ports and services. Read the ports and service names. Make a python dictionary like PORTS ={port_no[int]:service[string]} 
https://granneman.com/tech/networking/commonports

The response is something like this

Now, have a look at the final outcome of our port scanner

#!/usr/bin/env python3 # ... 1

import socket # ... 2
from ports import PORTS

def scan_known_ports(ip: str, port: int): # ... 4
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as my_socket:
status = my_socket.connect_ex((ip, port))
if status == 0: # ... 5
print(f'\t[+] Port: {port} runs {PORTS[port]}')

IP = input('IP address here: ') # ... 6
print(f'Scanning {IP} for open ports...') # ... 7
for port,_ in PORTS.items(): # Notice the use of items() to strip key
# and value and the underscore to overlook the data that is not needed
scan_known_ports(IP, port)

Bonus LLM Prompt

I wanted to know a bit deeper about the difference between python’s connect() and connect_ex(). So, I wrote the prompt and the answer was pretty satisfying.

Read the Python Doc here: https://docs.python.org/3/library/socket.html#socket.socket.connect
- Explain the differences between connect() and connect_ex() methods
- Suggest when to use which one

Response

... 
In summary:
Use connect() when a successful connection is essential and you want your program to stop immediately if it fails.
Use connect_ex() when you want to check the connection status and handle successful connections and failures differently within your code.

Here is our final script after adding argparse for giving commandline arguments to our script and keeping ports with service names in a seperate file for a more readable and maintainable code

#!/usr/bin/env python3 # ... 1
import socket
import argparse
from ports import PORTS

parser = argparse.ArgumentParser(
prog='portscan.py',
description='A basic port scanner in Python. It can scan for specific and all the 65535 ports'
)

parser.add_argument('-u', '--url', required=False, type=str, help='input url e.g. 192.168.0.1')
parser.add_argument('-p', '--port', required=False, type=str, help='all or kno which is known')
arguments = parser.parse_args()


def grab_banner(ip: str, port: int):
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as my_socket:
status = my_socket.connect_ex((ip, port))
if status == 0:
response = my_socket.recv(1024)
print(port, str(response))


def scan_a_port(ip: str, port: int): # ... 4 add ooption for not existing port
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as my_socket:
status = my_socket.connect_ex((ip, port))
if status == 0: # ... 5
if PORTS.get(port):
print(f'[+] Port: {port} runs {PORTS[port]}')
else:
print(f'[+] Port: {port} runs unknown service/not listed in PORTS dict')
print(f'Trying Banner Grabbing for ip: {ip} port: {port}')
grab_banner(ip, port)


if arguments.url and arguments.port == 'all':
print('[!] Scanning for All Ports. This will take a longer Time ...')
total_port = 0
for port in range(1, 65536):
scan_a_port(arguments.url, port)
total_port = total_port + 1
print(f'All {total_port} ports are scanned.')
elif arguments.url and arguments.port == 'kno':
print('[!] Scanning for Known Ports ...')
for port, _ in PORTS.items():
scan_a_port(arguments.url, port)
else:
parser.print_help()

Now we’ll ffed our LLM assistant with the following prompt


optimise the following python port scanner script
- Comment out your optimisations

The output is

#!/usr/bin/env python3

import socket
import argparse
from ports import PORTS

parser = argparse.ArgumentParser(
prog='portscan.py',
description='A basic port scanner in Python. It can scan for specific and all the 65535 ports.'
)

parser.add_argument('-u', '--url', required=False, type=str, help='Input URL (e.g., 192.168.0.1)')
parser.add_argument('-p', '--port', required=False, type=str, choice=['all', 'kno'], help='Scan all ports or known ports (kno)')
arguments = parser.parse_args()


def grab_banner(ip, port):
"""
Grabs service banner information from an open port (if available).

Args:
ip: The IP address of the target host.
port: The port number to check for a banner.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as my_socket:
try:
my_socket.settimeout(0.5) # Set a timeout for faster banner grabbing (optional)
my_socket.connect((ip, port))
response = my_socket.recv(1024)
print(f'[+] Port: {port} - Banner: {str(response)}')
except socket.error:
pass # Ignore errors if banner grabbing fails ... 1


def scan_a_port(ip, port):
"""
Checks if a port is open and identifies the service (if known).

Args:
ip: The IP address of the target host.
port: The port number to scan.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as my_socket:
status = my_socket.connect_ex((ip, port))
if status == 0:
service_name = PORTS.get(port)
if service_name:
print(f'[+] Port: {port} runs {service_name}')
else:
print(f'[+] Port: {port} runs unknown service (not listed in PORTS dict)')
# Attempt banner grabbing (optional)
grab_banner(ip, port)


if arguments.url and arguments.port == 'all':
print('[!] Scanning for All Ports. This will take a longer Time ...')
for port in range(1, 65536):
scan_a_port(arguments.url, port)
else:
if arguments.url and arguments.port == 'kno':
print('[!] Scanning for Known Ports ...')
for port in PORTS.keys():
scan_a_port(arguments.url, port)
else:
parser.print_help()

Notice: Passing an exception without any log is another good practice to generate ambiguity in the future.

      ... 
except socket.error:
pass # Ignore errors if banner grabbing fails ... 1

A pentester keeps a sharp eye on every error message for any scope of exploitation.

So, we are not going to take the optimisation of our LLM assistant rather optimise the code our own.

The final script after our tweak is

#!/usr/bin/env python3

import socket
import argparse
from ports import PORTS

parser = argparse.ArgumentParser(
prog='portscan.py',
description='A basic port scanner in Python. It can scan for specific and all the 65535 ports.'
)

parser.add_argument('-u', '--url', required=False, type=str, help='input url e.g. 192.168.0.1')
parser.add_argument('-p', '--port', required=False, type=str, help='all or kno which is known')
arguments = parser.parse_args()


def grab_banner(ip, port):
"""
Grabs service banner information from an open port (if available).

Args:
ip: The IP address of the target host.
port: The port number to check for a banner.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as my_socket:
try:
my_socket.settimeout(0.5) # Set a timeout for faster banner grabbing (optional)
my_socket.connect((ip, port))
response = my_socket.recv(1024)
print(f'[+] Port: {port} - Banner: {str(response)}')
except socket.error as err:
print(f'[-] Failed to fetch a banner. Error: {err}')


def scan_a_port(ip, port):
"""
Checks if a port is open and identifies the service (if known).

Args:
ip: The IP address of the target host.
port: The port number to scan.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as my_socket:
status = my_socket.connect_ex((ip, port))
if status == 0:
service_name = PORTS.get(port)
if service_name:
print(f'[+] Port: {port} runs {service_name}')
else:
print(f'[+] Port: {port} runs unknown service (not listed in PORTS dict)')
# Attempt banner grabbing
grab_banner(ip, port)


if arguments.url and arguments.port == 'all':
print('[!] Scanning for All Ports. This will take a longer Time ...')
total_port = 0
for port in range(1, 65536):
scan_a_port(arguments.url, port)
total_port = total_port + 1
print(f'All {total_port} ports are scanned.')
elif arguments.url and arguments.port == 'kno':
print('[!] Scanning for Known Ports ...')
for port in PORTS.keys():
scan_a_port(arguments.url, port)
else:
parser.print_help()

Conclusion

In this section we have wrote a very rudimentary port scanner in Python programming language using our LLM assistants.

The tutorial showed how we can keep our focus more on solving a problem by utilising the power of LLMs to generate rudimentary codes that will bring a little value against our efforts.

We also observed when to make our optimisations over the LLM’s solutions.

--

--

Ishtiaque Foysol

An avid learner, a parent, a self taught hardcore tester who breaks things to fix them.