PHP — How it works
Introduction to how the PHP programming language works
PHP is still a widely used programming language, as evidenced by the popularity statistics of major PHP frameworks such as Laravel and Symfony.
I selected the topic of how PHP works as the first topic because I believe it is essential to have a conceptual understanding of how the language works, before progressing to more advanced topics (which I also plan to write about).
History of PHP Programming language
The name PHP stands for “PHP: Hypertext Preprocessor”, which makes it a recursive acronym, and this means that part of the acronym (PHP) is also an acronym. However, at the beginning (in the 90s) PHP was introduced as a collection of scripts named PHP/FI (Personal Home Page/Forms Interpreter).
Since then, PHP has evolved into a mature language with a sizable share of the web technology market. Throughout its “maturation”, PHP has often undergone significant changes. For instance, the transition from PHP 4 to PHP 5 introduced major changes to object-oriented programming support, and the transition to from PHP 5 to PHP 7 introduced support for argument typing.
But rather than drag this story out, I’ll now move on to how PHP works.
How the PHP programming language works
The processing a PHP file can be divided into 4 primary stages that follow one after the other:
- Lexical analysis (tokenization).
- Syntax analysis (parsing).
- Compilation.
- Execution.
The conceptual flow of PHP code interpretation is illustrated in the diagram below.
In the diagram above, the light blue rectangles represent the stages, while the light green rhombus represent the data generated as a result of each stage.
Lexical analysis (tokenization) is the process of dividing source code into logical units (called lexemes / tokens). Examples of lexemes are programming language keywords (e.g. if, for, function). The objective of this analysis is to convert the code into a form more accessible for further processing, while omitting things that are irrelevant from the point of view of interpretation.
A simple example is the representation of lexemes (tokens) for a basic PHP script:
<?php
$x = 0;
$x++;
echo $x;
?>
Which will approximately correspond to the following set of tokens:
T_OPEN_TAG ('<?php')
T_VARIABLE ('$x')
T_WHITESPACE (' ')
=
T_WHITESPACE (' ')
T_LNUMBER ('0')
;
T_VARIABLE ('$x')
T_INC ('++')
;
T_ECHO ('echo')
T_WHITESPACE (' ')
T_VARIABLE ('$x')
T_CLOSE_TAG ('?>')
The process of converting code to tokens can be viewed in the PHP interpreter source file. A full list of tokens is available in the PHP documentation.
The PHP language itself also supports displaying tokens, this is well illustrated in the PHP documentation, where there is a simple script that parses the given code string into PHP language tokens and displays it.
The next stage is syntax analysis (parsing). In essence, it consists in constructing an appropriate hierarchical structure known as the Abstract Syntax Tree (AST), from tokens extracted in the previous stage. When building the tree, syntax rules are verified, which allows for catching syntax errors at an early stage. The result of this stage, the AST tree, will be used in the next stage for code compilation.
For example, in the case of the script presented above, the AST tree would consist of 3 nodes representing value assignment, increment, and display, respectively.
To visualize this case, I used the https://phpast.com/ tool, which gives a good representation of what the AST tree looks like for a given script:
In the presented image you can clearly see 3 nodes. The first node (index 0) corresponds to the assignment of the value 0 to the variable $x. The second node (index 1) is responsible for incrementing the value of the variable $x. The last node (index 2) is responsible for displaying the value of the variable $x.
Once the analysis phase is completed, the constructed AST tree is used in the next stage of PHP script processing, which is compilation.
The penultimate stage of processing a PHP script is compilation, during which the script is compiled into OPcode form based on the AST tree created in the preceding stage. OPcodes are low-level instructions that are intended to be executed by the Zend Engine virtual machine. OPcode contains a precompiled form of code, so next time you run it you will be able to bypass the earlier stages and retrieve the OPcodes directly from the OPcache.
More details about OPcache and how to properly configure it are available in the PHP documentation.
The final step in the interpretation process is the execution of the previously generated OPcodes on the Zend Engine virtual machine. As the script executes, its result is generated. In a simple case, it may be displaying some information, but considering the major use of PHP — creating web applications, the result will most often be HTML code, or a structured data format, such as JSON or XML, which is utilized for data interchange in web applications.
Conclusion
PHP is a language that has evolved over many years, which is why it holds a well-established position and offers a wide range of features available out of the box. Considering these aspects, along with PHP’s low entry barrier, an understanding of how the language works is often overlooked, despite being fundamental concepts. Knowledge of how PHP works is essential for writing efficient and secure code.
If you found this article helpful and would like to see more content like this, please consider showing your support by clapping and following to stay updated. Your support is greatly appreciated and serves as a strong source of motivation.
Thank you for reading! 🙌🏻
P.S. This is my first article, and I would greatly appreciate any feedback, whether positive or constructive.