How to Solve Captchas Automatically using Artificial Intelligence: Complete Guide

Published in

Be Tech! with Santander

8 min readNov 14, 2023

Currently there’s a lot of hype around Artificial Intelligence (AI), and deservedly so; nobody doubts that it is destined to change the world 🌍. All of us have been amazed by the responses to our prompts, the automatic generation of content, the autonomous systems and countless other possibilities that unfold before us as if by magic 🧙.

However, if you’re something of a techie you’d no doubt like to delve a little deeper and understand how it works. Nowadays we can do this easily because we have a lot of tools 🔧 at our disposal to set up our own laboratory.

Automatic captcha verification 🔍

In this entry we will implement a system capable of solving and automatically completing a relatively simple CAPTCHA composed of five numbers represented by images:

It consists of a simple PHP website that randomly selects the captcha and loads the corresponding png images extracted from an MNIST (Modified National Institute of Standards and Technology) dataset. This is one of the most popular and widely used resources in the field of machine learning, consisting of a set of 📸images of handwritten digits (from 0 to 9) and their respective labels.

The website will check the captcha entered by the user and verify whether it matches the randomly generated captcha:

Basically, what we want is to be able to complete it automatically but also correctly ✅. To do so, our system will first learn using the MNIST training 🏃 dataset and the corresponding labels (60,000 images), then it will perform tests using the MNIST test dataset (10,000 images). Finally, it will directly recognise the images extracted from the web, sending the captcha recognised.

Learning to automatically recognise MNIST images 🧑‍🎓

Let’s see 👀 how to do this step by step with Python. We recommend that you also open your console and run it so that you can check the output at the same time.

1️⃣ Firstly, we import the necessary libraries:

· os to set an environment variable related to the TensorFlow record level.

· numpy to work with matrices and numerical calculations.

· tensorflow and keras to build and train deep learning models.

matplotlib.pyplot to visualise the results.

$ python3
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import numpy as np
>>> import tensorflow as tf
>>> from tensorflow import keras
>>> import matplotlib.pyplot as plt

2️⃣ We then set the TensorFlow log level:

TF_CPP_MIN_LOG_LEVEL is set to ‘2’ to avoid displaying unnecessary warning messages.

>>> os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

3️⃣ We then load and pre-process the MNIST dataset:

· The training and test set is loaded using keras.datasets.mnist.load_data().

>>> (X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 521s 45us/step

· Normalise the images by dividing the pixel values by 255.0 to scale them in the range 0 to 1.

>>> X_train = X_train / 255.0
>>> X_test = X_test / 255.0

4️⃣ Next, select the design of the model architecture:

· Create a sequential model using keras.Sequential(), a class in Keras that allows the creation of sequential models, i.e. neural network models where layers are stacked sequentially on top of each other. In this type of model, the output of one layer is connected directly to the input of the next layer.

· The first layer (Flatten) transforms the input image from a 2D matrix to a 1D vector.

· The second layer (Dense) has 128 units with the ReLU activation function.

· The last layer (Dense) has 10 units (corresponding to the 10 possible classes) with the softmax activation function.

>>> model = keras.Sequential([
… keras.layers.Flatten(input_shape=(28, 28)),
… keras.layers.Dense(128, activation='relu'),
… keras.layers.Dense(10, activation='softmax')
… ])

5️⃣ Compile the model using model.compile().

Specify the optimiser (‘adam’), the loss function (‘sparse_categorical_crossentropy’) and the metrics to be evaluated (‘accuracy’).

>>> model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

6️⃣ Train the model using model.fit().

The training and test sets are provided, as well as the number of epochs (10 in this case).

During the training, the model adjusts the weights and makes adjustments to minimise the loss function.

>>> history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
Epoch 1/10
1875/1875 [==============================] - 9s 4ms/step - loss: 0.2571 - accuracy: 0.9267 - val_loss: 0.1478 - val_accuracy: 0.9556
Epoch 2/10
1875/1875 [==============================] - 7s 4ms/step - loss: 0.1143 - accuracy: 0.9659 - val_loss: 0.0977 - val_accuracy: 0.9703
Epoch 3/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0780 - accuracy: 0.9761 - val_loss: 0.0831 - val_accuracy: 0.9728
Epoch 4/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0560 - accuracy: 0.9831 - val_loss: 0.1003 - val_accuracy: 0.9677
Epoch 5/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0446 - accuracy: 0.9867 - val_loss: 0.0705 - val_accuracy: 0.9780
Epoch 6/10
1875/1875 [==============================] - 7s 4ms/step - loss: 0.0354 - accuracy: 0.9892 - val_loss: 0.0736 - val_accuracy: 0.9775
Epoch 7/10
1875/1875 [==============================] - 7s 4ms/step - loss: 0.0269 - accuracy: 0.9920 - val_loss: 0.0757 - val_accuracy: 0.9760
Epoch 8/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0229 - accuracy: 0.9928 - val_loss: 0.0712 - val_accuracy: 0.9779
Epoch 9/10
1875/1875 [==============================] - 7s 4ms/step - loss: 0.0187 - accuracy: 0.9946 - val_loss: 0.0745 - val_accuracy: 0.9785
Epoch 10/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0157 - accuracy: 0.9954 - val_loss: 0.0818 - val_accuracy: 0.9775

7️⃣ Evaluate the model for the test set using model.evaluate(), calculating the loss and accuracy.

>>> test_loss, test_accuracy = model.evaluate(X_test, y_test)
313/313 [==============================] - 1s 2ms/step - loss: 0.0818 - accuracy: 0.9775
>>> print("Test Loss:", test_loss)
Test Loss: 0.08180703967809677
>>> print("Test Accuracy:", test_accuracy)
Test Accuracy: 0.9775000214576721

8️⃣ Make the predictions for the test set using model.predict().

The predictions are converted to predicted labels using numpy.argmax().

>>> predictions = model.predict(X_test)
313/313 [==============================] - 1s 2ms/step

9️⃣ We then visualise the first 10 images of the test set along with their true and predicted labels using matplotlib.pyplot.

>>> n = 10 # Number of images to visualize
>>> plt.figure(figsize=(10, 10))
<Figure size 1000x1000 with 0 Axes>
>>> for i in range(n):
… plt.subplot(2, 5, i+1)
… plt.imshow(X_test[i], cmap='gray')
… plt.title(f"True: {y_test[i]}, Predicted: {predicted_labels[i]}")
… plt.axis('off')
…
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe454d5daf0>
<matplotlib.image.AxesImage object at 0x7fe45dff9a00>
Text(0.5, 1.0, 'True: 7, Predicted: 7')
(-0.5, 27.5, 27.5, -0.5)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe45dfdfac0>
<matplotlib.image.AxesImage object at 0x7fe45dfbd190>
Text(0.5, 1.0, 'True: 2, Predicted: 2')
(-0.5, 27.5, 27.5, -0.5)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe45dfa2250>
<matplotlib.image.AxesImage object at 0x7fe45df678e0>
Text(0.5, 1.0, 'True: 1, Predicted: 1')
(-0.5, 27.5, 27.5, -0.5)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe45df509a0>
<matplotlib.image.AxesImage object at 0x7fe45df14220>
Text(0.5, 1.0, 'True: 0, Predicted: 0')
(-0.5, 27.5, 27.5, -0.5)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe45df87130>
<matplotlib.image.AxesImage object at 0x7fe45decb7f0>
Text(0.5, 1.0, 'True: 4, Predicted: 4')
(-0.5, 27.5, 27.5, -0.5)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe45df35880>
<matplotlib.image.AxesImage object at 0x7fe45defae50>
Text(0.5, 1.0, 'True: 1, Predicted: 1')
(-0.5, 27.5, 27.5, -0.5)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe45deee070>
<matplotlib.image.AxesImage object at 0x7fe45dea3880>
Text(0.5, 1.0, 'True: 4, Predicted: 4')
(-0.5, 27.5, 27.5, -0.5)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe45de9a790>
<matplotlib.image.AxesImage object at 0x7fe454675d00>
Text(0.5, 1.0, 'True: 9, Predicted: 9')
(-0.5, 27.5, 27.5, -0.5)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe45dec7eb0>
<matplotlib.image.AxesImage object at 0x7fe45462c490>
Text(0.5, 1.0, 'True: 5, Predicted: 8')
(-0.5, 27.5, 27.5, -0.5)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe454692640>
<matplotlib.image.AxesImage object at 0x7fe45465ac10>
Text(0.5, 1.0, 'True: 9, Predicted: 9')
(-0.5, 27.5, 27.5, -0.5)
>>>
>>> plt.tight_layout()
>>> plt.show()

🔟 Finally, here’s a trick: save your model with pickle so you can use it later whenever you want.

with open('modelo.pkl', 'wb') as f:
pickle.dump(model, f)

Testing your model with the captcha website 🧑‍💻

Congratulations! 🎉 If you’ve made it this far and managed to generate your own model by following the above steps, now you can test it yourself.

Basically, we need a script that downloads each image from the web and processes it to obtain the corresponding captcha number. For example:

>>> make_prediction("http://localhost:8000/mnist_png/training/4/36097.png")
1/1 [==============================] - 0s 30ms/step
4

As you can see 👁️, the MNIST image of digit 4 has been recognised correctly. So, without further ado here ⬇️ is the simple script:

import io
import numpy as np
import requests
from PIL import Image
from bs4 import BeautifulSoup
from urllib.parse import urljoin
 
 
import pickle
 
# load model from file
with open('modelo.pkl', 'rb') as f:
   model = pickle.load(f)
 
session = requests.Session()
 
TARGET_IP = "127.0.0.1:8000"
BASE_URL = f"http://{TARGET_IP}/captcha.php"
 
def get_images():
   image_urls = []
   page = session.get(BASE_URL)
   soup = BeautifulSoup(page.text, 'html.parser')
   images_container = soup.find('div', class_='captcha-images')
   if images_container:
       images = images_container.find_all('img')
       for img in images:
           image_url = urljoin(BASE_URL, img['src'])
            image_urls.append(image_url)
           print(image_url)
   return image_urls
 
def make_prediction(image_url):
   # Load the image from the URL
   response = session.get(image_url, stream=True)
   response.raw.decode_content = True
   img = Image.open(response.raw)
   np_frame = np.array(img)
 
   # Match the input shape for the model
   image = np.array([np_frame / 255])
 
   # Make the prediction
   predictions = model.predict(image)
   prediction = np.argmax(predictions[0])
   return prediction
 
# First we get the image URLs from the web application
image_urls = get_images()
 
captcha_guess = []
 
# Process each image to get the predicted value
for image_url in image_urls:
   print(image_url)
   guess = make_prediction(image_url)
   print(guess)
    captcha_guess.append(str(guess))
 
# Construct a request with our predictions
captcha_string = "".join(captcha_guess)
print(captcha_string)
 
params = {
   'captcha': captcha_string,
   'submit': 'Verificar+Captcha'
}
 
# Send the request to the form
form_url = f"http://{TARGET_IP}/captcha.php"  # Update the URL if necessary
response = session.post(form_url, data=params)

And an example of its functioning:

$ python script.py
http://127.0.0.1:8000/mnist_png/training/4/14539.png
http://127.0.0.1:8000/mnist_png/training/8/35253.png
http://127.0.0.1:8000/mnist_png/training/9/50053.png
http://127.0.0.1:8000/mnist_png/training/0/38177.png
http://127.0.0.1:8000/mnist_png/training/5/51157.png
http://127.0.0.1:8000/mnist_png/training/4/14539.png
1/1 [==============================] - 0s 97ms/step
4
http://127.0.0.1:8000/mnist_png/training/8/35253.png
1/1 [==============================] - 0s 24ms/step
8
http://127.0.0.1:8000/mnist_png/training/9/50053.png
1/1 [==============================] - 0s 22ms/step
9
http://127.0.0.1:8000/mnist_png/training/0/38177.png
1/1 [==============================] - 0s 23ms/step
0
http://127.0.0.1:8000/mnist_png/training/5/51157.png
1/1 [==============================] - 0s 22ms/step
5
48905
Captcha number generated: 48905<br>Value entered by the user: 48905<br>
<!DOCTYPE html>
<html>
<head>
<title>Captcha</title>
<style>
.captcha-images {
display: flex;
justify-content: flex-start; /* Left-align the images */
margin-bottom: -10px; /* Delete the bottom margin */
}
.captcha-images img {
margin-right: 0px; /* Add a right margin between the images */
}
</style>
</head>
<body>
<h1>Captcha verification</h1>
<p>WOOOOOOW, THE CAPTCHA IS CORRECT!!!</p>
<img src="victory.gif" alt="Victory GIF">
<form method="POST" action="">
<div class="captcha-images">
<img src="mnist_png/training/7/17143.png" alt="Captcha Image">
<img src="mnist_png/training/2/10158.png" alt="Captcha Image">
<img src="mnist_png/training/2/30975.png" alt="Captcha Image">
<img src="mnist_png/training/6/7868.png" alt="Captcha Image">
<img src="mnist_png/training/4/53738.png" alt="Captcha Image">
</div>
<p>Enter the captcha number:</p>
<input type="text" name="captcha" pattern="[0–9]{5}" required><br><br>
<input type="submit" name="submit" value="Verify Captcha">
</form>
</body>
</html>

GPT-4: How does OpenAI’s most advanced AI work?

By Juanjo Prieto Torres. 2023 is the year of ChatGPT. Why does it represent a technological paradigm shift?

medium.com

Conclusions 📍

Algorithms, learning models, datasets… as you can see there’s a lot of information here and we could talk at length about many of the things described in this post, but it would be too long for a blog entry. That’s why I encourage you to contact me ☎️if you have any queries, want to chat about a specific point or topic (I’m still learning too!) or would like me to give you the PHP code of the captcha checker.

📮PS: This article was generated by AI… or was it? 🤷 xD If you’re interested, send your comments and in a future article we’ll see how we can detect content generated automatically by an AI-based language model like ChatGPT.

Before you go:

Clap if you liked it 👏, comment and share this article to reach more community 🧞.

Would you like to be part of our technology project? Find our open vacancies worldwide here 👉 https://www.betechwithsantander.com/en/home