[Toy Browser] HTTP Request and Response Parse
Introduction: [Toy Browser] Introduction
Keywords: HTTP request and response, Finite-state machine, response parse
As we have discussed in previous article, the first step of creating a toy browser is to establish connection with the server and parse the response. In specific, the toy browser should ask the server a response, and extract the HTML file from it.
Therefore, before getting started, we need to prepare a server. I used node.js
to create a simple one.
The server will put a html file into the body of a response, and sent it back to the client when the client asks for one.
Now, I start to create a toy browser.
To establish the TCP connection with the server, we need:
- Host: the host of server. In this case, it’s localhost (127.0.0.1)
- Port: the port that the server listens to. Here, it’s 8088.
There is some information which must be placed in a HTTP request.
- HTTP method: In this case, I use GET.
- Path: the path of requested page/data. Here, it’s the root (/).
- HTTP protocol version: In this case, I use 1.1 (it’s old. Now, it can be updated to HTTP 3).
In header, there are 2 properties that can’t be null.
- Content-Type: the type of request body. In this case, I use application/json.
- Content-Length: the length of the response body.
Since our server will send back a response whatever the request is, the request body is not important. If you want to send a request body, it must follow the json format. The first version of the toy browser is as followed.
In this version, I created a class for the request. The instance of this class contains all the basic information needed by HTTP request.
The class Request has a method, send. It turns a Promise, which means this function is asynchronous. It can wait for the server’s response.
This function, first, creates a connection with the server. Then, the function turns the instance into a string, and sends it to the server. After that, it waits for the response. If the response is well received, return it. If not, print out the error. In the end, this function breaks the connection. The response from the server is presented as followed.
We can see that, there is some information which is not very useful, like HTTP status. HTML file is the only thing we need. So, the next step, we need to parse this response, and extract the HTML.
(The code of client is available, click here. And the code of server is in here.)
There are 2 pieces of information which matters in response body.
- Transfer-Encoding: It’s a HTTP header. It indiques how the response body is encoded. In this case, it’s chunked.
- Response body: It contains the information we want. Here, it’s HTML file.
To get these 2 parts, I created a ResponseParser class. This class has a method, receiveChar. It parses the response by using Finite-state machine.
In general, this class has an attribute, current, which presents the current parse state. Current parse state tells the user what kind of the information the parser is parsing.
I set 8 status:
- STATUS_LINE and STATUS_LINE_END: These 2 status are used to read HTTP status. In this case, HTTP status is HTTP/1.1 200 OK.
- HEADER_NAME, HEADER_SPACE, HEADER_VALUE: These 3 read a HTTP header. For example, Transfer-Encoding: chunks.
- HEADER_LINE_END: It indiques that the parser has finished reading a HTTP header
- HEADER_BLOCK_END: It indiques that the parser has finished reading all the headers
- BODY: It shows that the parser is reading the response body.
Response parser reads the characters in response, one by one. The current state changes according to each character. Parser state is start from STATUS_LINE.
As the result, all the headers is saved in an attribute, headers, which is a JavaScript Object. Before parsing the response body, we need to verify how the body is encoded. As mentioned before, we should check the header: Transfer-Encoding. To make the toy browser simple, it can only decode chunked body.
Next step, we need to create a Chunked Body Parser.
(The code of ResponseParser is available, click the link)
Similar to Response Parser, Chunked Body Parser uses Finite-state machine, too. Since response body is composed by several contents and each content only contains two parts: content length and content itself. The states in this parser is fewer than previous one. They are as followed:
- LENGTH, LENGTH_END: They parse the content length
- READING_TRUNK: It parses the body itself.
- NEW_LINE, NEW_LINE_END: They prepare to read the next content.
Parser state is start from LENGTH. It reads content length and saves it as decimal number. While the parser parses the content, this length will diminue until it is equal to 0. That means all the content is parsed. After that, the parser looks for the next content.
In this case, 2a8 indiques the length of the HTML file, and 0 shows there is no second content in the response body.
In the end, all the contents is saved in the attribute, content, in the class.
(The code of Chunked Body Parser is available here.)
To sum up, in the step, we create the first version of toy browser.
The toy browser can send a request to the server, and receive its response.
To parse the response, we create Response Parser. It can capture the useful information in HTTP header and response body.
Response body can be encoded in many different format. To make it simple, the toy browser can only parse chunked response body. We create Chunked Body Parser to deal with it.
These 2 parsers use Finite-state machine to parse the information, and save the results in their attributes. In the end, the client call the response()
function to get the parsed response. It contains both header and body of the HTTP response.