Simulating Web requests with custom User-Agent using PHP

Camilo Herrera
winkhosting
Published in
10 min readJun 4, 2024

--

Photo by Philipp Katzenberger on Unsplash

Welcome back to this new installment of interesting content for everyone! (For everyone with glasses and allergies mainly).

Today we are going to create a tool to simulate web requests using a custom User-Agent. It will be useful for the following cases (the ones I considered when creating it):

  • Verify the behavior of a website when receiving a request from certain types of devices, confirm if it performs redirects, and check the HTML content presented.
  • If you have a WAF configured on a website, you can confirm if there are blocking or restriction rules for certain types of User-Agent strings. This can be useful to mitigate some types of attacks or scans for known vulnerabilities.
  • Additionally, if you applied rules in a .htaccess file associated with the User-Agent string on a website, you can test them using this tool.

Alright, now let’s get to the important part, our solution. If you’ve been reading my posts, you’ll notice that I generally follow the same pattern and structure for this type of small tools because I’m very eco-friendly and I like to reuse, reduce, and recycle code ;).

With that out of the way, let’s propose the structure and requirements for our solution:

Requirements

As always, we will use a development environment with the following tools:

  • Apache 2.4.x: Our old and well-known web server
  • PHP 8.3: It can work without problems with PHP 8.2 and 8.1, you just need to test it. I tested it with 8.3 and 8.2
  • cURL extension: Enabled and configured with support for SSL connections (This is very important for HTTPS sites, otherwise you will encounter errors)
  • Bulma CSS: This is the GUI library I usually use, you can visit them at Bulma CSS
  • Your favorite code editor: (VS Code in my case)
  • Highlight.js: I found this library when trying to apply syntax highlighting to the HTML code returned in the request response. It’s quite handy and we will use it to give a touch of elegance to the displayed information. You can visit them at Highlight.js
  • Patience, dedication, willingness to learn, and 50 push-ups daily

Now that we have covered the requirements, let’s look at the structure of the files we will use:

File Structure

The files we will use are as follows:

  • Headersim.php: This file contains the class responsible for the logic and functions to make requests with a custom User-Agent and process the content received from a website.
  • index.html: Our graphical interface will be displayed in this file. It will also be responsible for sending the request to the backend with the name of the host to query and the User-Agent text to use.
  • requestmanager.php: An old acquaintance from other posts, this file will receive the request made from index.html, instantiate the Headersim class, and return the result in JSON format.
  • testbench.php: This little friend will be used to perform tests without a web interface. It is not necessary, but it can be useful in other solutions or to debug and execute code without a browser.

Create a directory that is visible through your web server, and in it create the files. In my case, I named the directory “headersim” and its content will be as follows:

/headersim/
│ Headersim.php
│ index.html
│ requestmanager.php
│ testbench.php

Now we’re moving, one by one, through our files to understand their content and utility.

Headersim.php

Our soul, the engine of our solution, the lead singer, our MVP… well, that’s the idea, it’s responsible for performing the most important operations.

This class has the following attributes:

  • $result: It is an array responsible for storing the results of the performed query. In this case, it will contain the elements “headers”, “body”, “siteTitle”, “host”, “userAgent”, and “httpCode”. I think the names are quite clear, meaning that after making a request, $result will store the headers, body, title, host, User-Agent, and HTTP code returned by the queried site.
  • $headers: It is a string that will contain, as its name indicates, the headers returned by the queried site and generally have the following structure:
HTTP/1.1 200 OK
Date: Wed, 29 May 2024 16:21:30 GMT
Server: Apache
Vary: Accept-Encoding
Content-Length: 3483
Content-Type: text/html; charset=UTF-8
  • $body: This string will contain the HTML response from the website. This will help determine if changing the User-Agent results in a different HTML response, for example.
  • $httpCode: This string will contain the returned HTTP code. Typically, if everything goes well, it will be 200 (OK), but any errors will be reflected in this code.
  • $siteTitle: It will contain the title of the website (if it has one), extracted from $body, and will be the string between the “<title></title>” tags.
  • $host: Stores the host/IP to which the request is made, coming from the form in index.html.
  • $userAgent: Stores the User-Agent to be used in the request, also coming from the form in index.html.

Now, the methods within the class:

  • __construct(): The constructor of our class. Here we define the default values of our attributes.
  • sendRequest(): This method is responsible for executing the request to a website. It receives two parameters: $host and $userAgent. $host is the address or IP of the site to be queried, and $userAgent is the User-Agent string to be used.
  • getHeaders(): This function receives the same parameters as sendRequest() but specifically requests the headers of the queried website.
  • getBody(): It also receives the same parameters as sendRequest() but returns the body of the response, i.e., the HTML returned by the site when queried.
  • extractSiteTitle(): This function extracts the title of the website from the HTML returned by getBody(). It does this using a regular expression to capture the text between the “<title></title>” tags.

And that’s it! Now let’s see how the PHP file looks when implemented.

<?php

/**
* Class Headersim
*
* This class is responsible for sending HTTP requests to a given host,
* retrieving headers and body content, and extracting the site title.
*/
class Headersim
{
/**
* @var array $result Array to store the results of the HTTP request.
*/
private array $result;

/**
* @var string $headers String to store the headers of the HTTP response.
*/
private string $headers;

/**
* @var string $body String to store the body of the HTTP response.
*/
private string $body;

/**
* @var string $httpCode String to store the HTTP response code.
*/
private string $httpCode;

/**
* @var string $siteTitle String to store the extracted site title.
*/
private string $siteTitle;

/**
* @var string $host String to store the host for the HTTP request.
*/
private string $host;

/**
* @var string $userAgent String to store the User-Agent for the HTTP request.
*/
private string $userAgent;

/**
* Headersim constructor.
* Initializes the class properties.
*/
public function __construct()
{
$this->result = array();
$this->headers = "";
$this->body = "";
$this->httpCode = "";
$this->siteTitle = "";
$this->host = "";
$this->userAgent = "";
}

/**
* Sends an HTTP request to the specified host with the given User-Agent.
*
* @param string $host The host to send the request to.
* @param string $userAgent The User-Agent string to use for the request.
* @return array The result of the HTTP request, including headers, body, site title, host, User-Agent, and HTTP code.
*/
public function sendRequest(string $host, string $userAgent): array
{
$this->host = $host;
$this->userAgent = $userAgent;

$this->result["headers"] = $this->getHeaders($host, $userAgent);
$this->result["body"] = $this->getBody($host, $userAgent);
$this->result["siteTitle"] = $this->extractSiteTitle($this->body);
$this->result["host"] = $this->host;
$this->result["userAgent"] = $this->userAgent;
$this->result["httpCode"] = $this->httpCode;

return $this->result;
}

/**
* Retrieves the headers of the HTTP response from the specified host.
*
* @param string $host The host to send the request to.
* @param string $userAgent The User-Agent string to use for the request.
* @return string The headers of the HTTP response.
*/
public function getHeaders(string $host, string $userAgent): string
{
$curl = curl_init();

curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 60);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "GET");
curl_setopt($curl, CURLOPT_USERAGENT, $userAgent);
curl_setopt($curl, CURLOPT_URL, $host);

$this->headers = curl_exec($curl);
$this->httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);

return $this->headers;
}

/**
* Retrieves the body of the HTTP response from the specified host.
*
* @param string $host The host to send the request to.
* @param string $userAgent The User-Agent string to use for the request.
* @return string The body of the HTTP response.
*/
public function getBody(string $host, string $userAgent): string
{
$curl = curl_init();

curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 60);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "GET");
curl_setopt($curl, CURLOPT_USERAGENT, $userAgent);
curl_setopt($curl, CURLOPT_URL, $host);

$this->body = curl_exec($curl);
$this->httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);

return $this->body;
}

/**
* Extracts the site title from the body of the HTTP response.
*
* @param string $body The body of the HTTP response.
* @return string The extracted site title.
*/
private function extractSiteTitle(string $body): string
{
$this->siteTitle = "NOT FOUND";

$arrMatches = array();
preg_match("/\<title\>.*\<\/title\>/", $body, $arrMatches, PREG_OFFSET_CAPTURE);

if (!empty($arrMatches)) {
if (isset($arrMatches[0])) {
if (isset($arrMatches[0][0])) {
$this->siteTitle = strip_tags($arrMatches[0][0]);
}
}
}

return $this->siteTitle;
}
}

index.html

This file contains our user interface. In summary, it captures the parameters of the query in two text fields, “host” and “userAgent”, the latter being a textarea. It also allows the user to click on the “Send Request” button, which sends the information to the requestmanager.php file via a POST request.

The result of the query is returned to the file and displayed in the controls within the HTML code. The interface will look like this:

Now let’s see the content of the file. You’ll also notice that we included the highlight.js library to highlight the syntax of the HTML code returned in the response and our beloved bulma CSS.

<!DOCTYPE html>
<html>

<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>User-Agent Simulator</title>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bulma@0.9.4/css/bulma.min.css">

<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/styles/default.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script>


<script type="module">
window.addEventListener('load', (event) => {

document.querySelector(".sendRequest").addEventListener('click', (event) => {

event.currentTarget.classList.add('is-loading');
event.currentTarget.disabled = true;

document.querySelector(".result").parentElement.classList.add("is-hidden");
document.querySelector(".error").parentElement.classList.add("is-hidden");

const payload = JSON.stringify({
"host": document.querySelector(".host").value,
"userAgent": document.querySelector(".userAgent").value
});

fetch('requestmanager.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: payload,
})
.then(response => response.json())
.then(data => {

document.querySelector(".result").parentElement.classList.remove("is-hidden");

console.log(data)
console.log(data.headers)
console.log(data.body)

document.querySelector(".headers").innerText = data.headers;
document.querySelector(".siteTitle").innerText = data.siteTitle;
document.querySelector(".resHost").innerText = data.host;
document.querySelector(".resUserAgent").innerText = data.userAgent;
document.querySelector(".httpCode").innerText = data.httpCode;
document.querySelector(".body").textContent = data.body;
document.querySelector(".body").removeAttribute("data-highlighted");

hljs.highlightAll();

})
.catch((error) => {
document.querySelector(".error").parentElement.classList.remove("is-hidden");
document.querySelector(".error").innerText = error;
console.error('Error:', error);
}).finally(() => {
document.querySelector(".sendRequest").classList.remove('is-loading');
document.querySelector(".sendRequest").disabled = false;
});
});

});
</script>
</head>

<body>
<section class="section">
<div class="columns">
<div class="column">
<div class="field">
<label class="label">Host</label>
<div class="control">
<input class="input host" type="text" placeholder="Hostname/IP">
</div>
<p class="help">Type the hostname/IP</p>
</div>
<div class="field">
<label class="label">User-Agent</label>
<div class="control">
<textarea class="textarea userAgent" placeholder="User-Agent string"></textarea>
</div>
<p class="help">Type the User-Agent string that you want to use for the request and click the Send
Request button</p>
</div>
<div class="field">
<p class="control">
<button class="button is-black sendRequest">
Send Request
</button>
</p>
</div>
</div>
</div>
<div class="columns is-hidden">
<div class="column result">
<div class="columns">
<div class="column">

<article class="message">
<div class="message-header">
<p>HTTP Response Code</p>
</div>
<div class="message-body">
<div class="columns">
<div class="column httpCode">

</div>
</div>
</div>
</article>

<article class="message">
<div class="message-header">
<p>Host</p>
</div>
<div class="message-body">
<div class="columns">
<div class="column resHost">

</div>
</div>
</div>
</article>

<article class="message">
<div class="message-header">
<p>Site Title</p>
</div>
<div class="message-body">
<div class="columns">
<div class="column siteTitle">

</div>
</div>
</div>
</article>

<article class="message">
<div class="message-header">
<p>User-Agent</p>
</div>
<div class="message-body">
<div class="columns">
<div class="column resUserAgent">

</div>
</div>
</div>
</article>

<article class="message">
<div class="message-header">
<p>Headers</p>
</div>
<div class="message-body">
<div class="columns">
<div class="column headers">

</div>
</div>
</div>
</article>

<article class="message">
<div class="message-header">
<p>Body HTML</p>
</div>
<div class="message-body">
<div class="columns">
<div class="column">
<pre>
<code style="min-width: 100%; width: 0px; overflow: auto;" class="language-html body"></code>
</pre>
</div>
</div>
</div>
</article>

</div>
</div>

</div>
</div>
<div class="columns">
<div class="column is-hidden">
<div class="notification is-danger error has-text-centered">
</div>
</div>
</div>
</section>
</body>

</html>

Let’s continue with the next file, requestmanager.php.

requestmanager.php

This file serves as the intermediary between our Headersim class and the requests coming from the interface in index.html.

Its operation is quite simple: it includes the Headersim.php file, decodes the parameters received in the POST request from index.html, executes the sendRequest() function of the object, and finally captures the received response and returns it in JSON format to be displayed in index.html.

Here’s the content of the file:

<?php

// Includes the class definition to be used for our request.
require("Headersim.php");

// Decodes the parameters received from the index.html file and stores them in the $paramsFetch array.
$paramsFetch = json_decode(
file_get_contents("php://input"),
true
);

// Instantiates our class.
$headerSim = new Headersim();

// Sends the host name and User-Agent to be used in the request.
$result = $headerSim->sendRequest($paramsFetch["host"], $paramsFetch["userAgent"]);

// Returns the response in JSON format and ends the execution.
$jsonResponse = json_encode($result);
echo $jsonResponse;
exit;

And finally, let’s talk about the file testbench.php.

testbench.php

This file is useful for testing your code before having a fully functional web interface, or if you want to execute code during a debugging process using only the command line on your PC.

Its content is similar to that shown in requestmanager.php, but we do not receive parameters via POST since it is used from the command line. Instead, literal values for the parameters are simply written like this:

<?php

// Includes the class definition to be used for our request.
require("Headersim.php");

// Instantiates our class.
$headerSim = new Headersim();

// Executes the request.
$result = $headerSim->sendRequest("https://winkhosting.co", "Googlebot/2.1 (+http://www.googlebot.com/bot.html)");

// Displays the response content.
print_r($result);

Now, if everything goes well and you copied the code correctly, you can perform queries simulating a User-Agent by accessing the URL where your files are located, in my case http://localhost/headersim/index.html.

And that’s it! Don’t forget that at winkhosting.com we are much more than Hosting!

P.S. Don’t forget the 50 push-ups!

--

--