How do websites really work?

A lot of people struggle with technology when they’re starting on their marketing journey. Don’t let that person be you! In this article I’m going to go “under the hood” and give you a newbie-friendly explanation of how websites actually work. Understanding this stuff will make your life easier when it comes to building websites and funnels. So let’s start learning!

Why you need technical skills

OK so I published an article on the biggest affiliate marketing myths. And while one of them was “you need to be a technical expert”, another of the myths was “you don’t need any technical skills”.

While you don’t need to be a professional web developer or computer science major to get started, I think you really should be building up your technical skills on your marketing journey. It will make it easier for you to investigate and fix things. You don’t want to be relying on expensive consultants all your life.

Being able to jump in and muck around with CSS to improve or fix your website isn’t just efficient from a time and money perspective. It’s also a great feeling! And this stuff isn’t actually as hard as you might think.

Except Javsacript programming. That’s actually hard. But we won’t be going there in this article.

What is the Internet and the World Wide Web anyway?

So what is the internet anyway? It’s just a bunch of computers and networks that are all hooked up together on a giant network, sharing information. You can think of it as consisting of clients, which request information, and servers, which respond with that information. But what about the World Wide Web?

There are lots of different types of information being shared on the internet, and they use different protocols. A protocol is just a language for machines on the internet to talk to each other.

For example, email takes place with two protocols: SMTP and POP. So email is part of the internet, but it’s not the same as the web. The World Wide Web is the exchange of information on the internet over a protocol called HTTP, which stands for Hyper Text Transfer Protocol.

Servers on the web respond to HTTP requests, by serving resources to clients. And those resources are usually Hypertext documents (which are just documents that can link to other documents), aka web pages, and the static files those Hypertext documents need to work (image files, javascript files, stylesheet files).

You can’t understand how the web works without understanding how HTTP works. Everything (literally everything) that happens on the web happens on the web happens over HTTP (or it’s twin brother, HTTPS).

What is HTTP?

HTTP is a simple but brilliant protocol for exchanging information over the web. Basically, a client makes an HTTP request for a resource, and a server returns are response. That response might include the resource that was requested, or it might return some kind of error code. This forms the “request response cycle” (see diagram below).

request response cycle http
The basic request-response cycle of the web

Many people think and talk about “going to a webpage” or “visiting a website”. This is an easy way to think and talk about things, but it is completely wrong. Clients don’t go to web pages, web pages go to them.

More specifically, a client requests a resource (such as this article), and a server returns them that resource. The client effectively downloads it, and their browser renders it (shows it on a screen). This is a subtle but important change in mindset that will help you understand a lot of what is to follow.

So clients make HTTP requests from servers. The HTTP request might be for a file, like a PDF or a sound file. In the old days of the internet, it would often be for an HTML document. Which is known informally as a web page. But people these days don’t request HTML files very often.

What happened to HTML?

In the very early days of the web, people would write an HTML document (using Notepad or something) and put it on their web server. And clients would request it (say make an HTTP request for www.example.com/somepage.html), and their computer would download it and the browser would render and show it.

But this wasn’t very good. Every time something changed, the HTML would need to be edited, re-created and saved. People quickly wanted to build websites with dynamic content, that would be updated in real-time. For example, if you want a website that shows the current temperature, you don’t want to have to go in and edit the HTML every five minutes when the weather changes.

You’d want a computer to figure out the temperature, probably using an API (we’ll get to those in another article), and dynamically generate the HTML page whenever anyone requests it.

So nowadays, people usually just request a general resource, not a file. So they would make a request to www.example.com/weather, and the server would figure out the weather, construct an HTML document, and send it to the client as a response. Cool huh!

Web servers versus application servers

So we then saw the rise of web applications. A web application is just a piece of technology that generates web content in response to a request for that content.

Wordpress is a famous example of a web application. This web page (HTML document) you’re reading right now was generated by the Wordpress web application. If you right click on it and choose View Source, you can see the full HTML document that was sent to your computer.

Now I didn’t write that HTML file. It was generated in a tiny fraction of a second by a web application called Wordpress, in response to the request for this URL.

Wordpress is an application written in a programming language called PHP. It stores a lot of information (including the actual words of all your articles) in a database. The database technology it uses is called MySQL.

The particular combination that a Wordpress site uses (Linux operating system, Apache web server, MySQL database, PHP application) is very common for web applications. It is known in tech circles as the “LAMP Stack” (LAMP stands for Linux Apache MySQL PHP). Stack just means a combination of technologies working together.

We still have web servers

So we now have web applications, but web servers are still around. Their job is to handle HTTP (and HTTPS) requests. So now you can see the basic architecture of the web. Clients make HTTP requests, which go to a web server, which passes them to a web application to process.

The job of the web server is just to handle HTTP requests. It makes sure the requests are valid, it makes sure the address is a real one that exists, and it handles things some other things like caching that I won’t go into here.

There are three popular web servers: Apache (which runs on Unix / Linux), Nginx (pronounced “Engine X”, also runs on Unix / Linux), and IIS (Internet Information Server, which runs on Windows). Apache is by far the most popular one. This website runs on Apache and yours probably does too, if you have one.

You can configure some of the fundamental settings of your web server in a file called HTACCESS. When your webserver gets an HTTP request, it quickly checks your settings in HTACCES to see if it needs to do a redirect or anything, then passes it onto the right web application (Wordpress for this site) to handle.

The web application does what it needs to do (in my example, it produces this web page) and hands it to the web server, which then returns it to the client.

So this is what the request-response cycle looks like now, with separate web and application servers:

request response web server
Request response cycle with web and app servers

Don’t forget HTTPS

I’ve been talking about HTTP, but it actually comes in two flavours. Plain old HTTP, and HTTPS, HTTPS stands for Hyper Text Transfer Protocol Secure, and is basically an encrypted form of HTTP.

The problem with HTTP is that it isn’t secure, at all. Everything happens in “plain text”. So if you make an HTTP request for www.example.com/login and include with that request “by the way my user id is JohnSmith and my password is SecretPassword”, then anyone intercepting and reading that request will see the information “by the way my user id is JohnSmith and my password is SecretPassword”. That is obviously disastrous!

So people decided to come up with a secure version of HTTP, which sends everything encrypted instead of in plain text. So if someone intercepted that message sent over HTTPS, all they would see is “sdif8j34f87s7h7dy7$37hrs78y5eYSDFIUHsdhfoiuaskjdf” or similar.

Which is complete gobbledygook! The client and server agree on a secret encryption key at the beginning of the session, and people spying on that conversation can’t decrypt it, even if they read every message passed between them, including the conversation where they agree on the secret key! If that sounds impossible, it does this by something using Public Private Key Cryptography, which is super clever and interesting.

When you go to a website and see a padlock in the address bar, you are on a secure session and your requests and responses are all encrypted (you can also tell by the https: instead of http: at the website address).

The good thing about HTTPS is that apart from the encryption, it works exactly the same as HTTP. It just takes all the existing HTTP requests and responses, and runs them through a layer of encryption before it sends them. Then, they are decrypted at the other end and turned into plain old HTTP, which everyone understands.

So all the request format, headers, status codes and everything is exactly the same for HTTPS. It just has an extra layer of security over the top.

You should move your website over to HTTPS as soon as possible because Google likes HTTPS and hates HTTP. To do that, you need to install an SSL Certificate (SSL stands for Secure Sockets Layer, which basically is that extra layer of encryption that runs over HTTP.

Once you’ve installed your SSL certificate, you can configure your website to redirect all http:// requests to https:// requests. Then your site will basically be on HTTPS. That configuration is usually done in CPanel.

HTTP responses

When you make an HTTP request and it successfully goes over the internet and reaches the destination host, it will be picked up by a web server. That web server will look at the request, first to make sure that it is a valid HTTP request. It will check the htaccess file to see if it needs to do anything unusual, like redirect it.

If the request was for a file (like an image or PDF), and the client is allowed to access that file, it will send back the file, with a response code. Otherwise, it will send it to the relevant web application to process. And then send back a response, with a response code.

Learning the HTTP response codes is a really important and useful thing to do. They take the form of a three digit number.

Unfortunately, there are a lot of them. Fortunately, there are only a handful of common ones and lots of very rare ones. And better news, the first digit of the number tells you almost all you need to know.

Basically, there are response codes that start with 2 or 3, which are the good ones. And there are responses that start with 4 or 5, which are bad ones. More specifically:

  • Responses that start 2 are “OK” type responses: the server is successfully able to return the resource that the client requested. The best one is 200 OK (which means everything is fine).
  • Responses that start with 3 are “redirect” type responses: the server is able to send the client to another address to get that resource (there are a few different types of redirects, the most common one is 301 Moved Permanently)
  • Responses that start with 4 are request i.e. client-side errors: the client made a bad request. This could be a request for a resource that doesn’t exist (the famous 404 “page cannot be found” error), or doesn’t have access to (403 client is unauthorized for that resource error) or a few others
  • Responses that start with 5 are response i.e. server-side errors: the client made a perfectly good request, but the server was unable to provide a good response. This could be because it is overloaded and just can’t handle the request (503 Service Unavailable, which you sometimes see when a server gets smashed with a huge amount of traffic), or the application that is supposed to handle the request failed (the dreaded 500 Internal Server Error, which you don’t ever want to see, trust me).

Seeing the HTTP response codes in action

To understand this stuff better, you really need to peek under the hood and see it in action. Go to your favourite website (any website will do) in Google Chrome, then hit F12. The Chrome Developer Tools will open. This is a very powerful piece of software that comes free with Chrome and is invaluable for investigating problems with websites.

Go to the Network tab in the Developer Tools, then hit Control R or F5 to refresh the page. As the website loads, you will see a big list of network requests appearing on the Network tab. These are all the HTTP/S requests that your browser is making!

For each request, you can see what type of request it is (most requests will be GET, which just means “I want that thing, send it to me!”, but some might be POST, which means “I’m sending you some information”).

You will also see the response status code. They should all be 200s, but who knows — I just a 403 (unauthorized) for a POST request to some metrics call made by the jquery javascript file when doing this on my website — I might have to look into that! See how useful this tool and knowledge is? Now I know I have a problem with my jquery metrics call (luckily that doesn’t sound like something I urgently need to fix, but it is odd).

You can also see the file type, the Initiator (did the request come from the parent HTML file? Or maybe from a Javascript file or something?), the file size of the resource, and the time it took to load. This is all really helpful in figuring out problems with your site speed and behaviour.

chrome dev tools
This is Chrome Dev Tools, looking at my website via the Network tab — you can see some of the HTTP requests my homepage makes when sent to a client

If you want to learn more about this stuff and really get under the hood, there is a lot more you can do in Chrome Dev Tools. You can go the Sources tab and play around with the cookies or local storage (feel like impersonating someone and hacking into a site? Have a try!), the Console tab to look at Javascript errors (or writing and running your own Javascript programs, though this is not for the faint-hearted).

But an extremely useful feature is the Elements tab, which you can access quickly by right-clicking on any part of a website and choosing Inspect. There you can see the actual thing you’ve selected in the HTML file itself, plus loads of information about it (what styles it is inheriting, how much space it is taking up, and so on. Hours of fun!

What a web page really is

Fundamentally, a web page is an HTML document. HTML is HyperText Markup Language, which is a markup language (not a programming language) that describes the content and layout of a page. However, there are other important components of a web page: styling and behaviour. These form the trinity of HTML, CSS and JS.

What HTML, CSS and JS really are

So HTML is a markup language. An HTML document is the core of a web page. It has the words that you would see on a page, plus some tags that describe the overall containers and layout, such as headings, paragraphs, and so on. These are contained in opening and closing tags.

So to start a paragraph in HTML, you would use an opening paragraph tag (which is <p>), then the words in the paragraph, then a closing paragraph tag (which is </p>). Now you know some HTML!

But there is only so much HTML can do. Web pages also need to look pretty, which is what CSS Is for. CSS stands for Cascading Style Sheets and is the best way to do the styling for your web pages.

People used to define styles (colours, fonts, and so on) in the HTML itself, which you can still do. But it was difficult and risky to do it that way on a big site. If you wanted to change the font of your headings, you would have to change the HTML on hundreds or thousands of pages.

So people came up with stylesheets. These are a simple text file that describes the styles used on your pages. So you can specify the font of your headings, the colour of your footer bar, and so on. And that stylesheet was used and referenced by most or all of the pages on your website. Now you could change the look and feel of your entire website by just changing one word in one file! Very cool and powerful.

JS is the other part of the holy trinity of websites. JS is short for Javascript, which is a programming language used by web browsers. Every browser in the world supports Javascript and every website (pretty much) in the world uses Javascript, to some extent.

web file types
The holy trinity of web files: HTML, CSS and JS

Javascript makes websites dynamic. Not in the sense that Wordpress dynamically generates web pages in response to a request, but dynamic in that a web page once loaded in a browser can dynamically change in response to things that the client does (clicking on buttons, dragging things around, etc). It’s difficult to imagine websites without Javascript.

In fact, some websites like Gmail are built almost entirely with javascript. Only a tiny framework of HTML goes to the client, and all the rest of the content is dynamically generated and pulled down and rendered in Javascript.

Websites like this are called Single Page Applications (SPAs) and are way beyond the scope of this guide.

Other resources

There are of course other resources, such as images and videos and sound files which can be downloaded over the web. These are usually referenced by an HTML file. When the initial HTML document is sent to the client, it will usually contain references to other files: at least one CSS file, at least one Javascript file, at least one image file, and so on.

The browser then makes HTTP (or HTTPS) requests for those other resources. Those are brand new requests that go to the web server and follow the request response cycle. However, there is a difference. Those requests are known as requests for “static” files. You’ll hear tech nerds talk about “statics” a lot. I’ll explain what’s going on here.

Remember how I said that most websites are dynamic, and people don’t request an actual file (even an HTML file), but instead a general resource? For instance, the homepage for my blog is https://citizenaffiliate.com — if you make a request for that, what will the webserver do?

Poor Apache doesn’t have a clue what to show. My homepage is dynamic — it has different things on it every day, depending on what blog posts I have made, what comments people have added, and so on. Wordpress needs to generate that HTML file based on a whole bunch of things.

So Apache says “I don’t know how to handle this, you figure it out” and throws the request to Wordpress. Now the PHP code in Wordpress says “ah this is a request for the homepage, I know how to put that together”, and grabs stuff from the database, your settings, widgets, plugins, and so on. And then it puts together a big HTML document and sends that to Apache, which returns it to the client.

Now the client has an HTML file and it can start rendering the page, but it will include references to files, like images, stylesheets, and javascript files.

Static files and CDNs

Now those files aren’t dynamic at all, they are static! They don’t change. My CSS stylesheet file doesn’t change, it is exactly the same day in and day out. It might change once every six months or something if I update my theme settings. Same with my javascript file. And my images, they never change at all!

So why does Wordpress need to be involved here? It doesn’t. There is no calculations or database lookups or anything required to serve responses to those requests for static resources. So Apache will usually just take care of that.

Or in the case of common javascript libraries that are used by websites all over the world, they are often hosted somewhere else, like on Google’s servers. (An HTML document for your website can have references to CSS or JS files on other servers, just like it can have references to images on Wikipedia or whatever).

In fact, some smart people have figured out that it makes sense for all your static files to be served elsewhere. Why have people connect to your slow web server for those files, when they could be served by some big expensive cloud server? Or if you had a network of big expensive cloud servers all over the world, you could even be smarter and server them from whichever one is closer to the client that requested it.

That is what a CDN (Content Delivery Network) does! It takes care of serving all the static resources, so all your website has to do is construct and return the HTML page with the dynamic content.

In fact, some CDNs go even further and can pre-construct the HTML page for your website. Then they can serve that page to clients instead of Wordpress going and figuring it out. All you have to do is link your Wordpress site to the CDN so that it updates the CDN whenever you write a new post or approve a new comment.

That would mean your HTML pages would be different, so the CDN requests the resources, gets the HTML pages, and stores them in its cache again.

This can make for lightning fast websites. In fact, my website is hosted on an advanced CDN that does exactly that (notice how crazy fast it loads? Cool huh!). They’re called Peakhour, check them out (I’m not an affiliate of theirs and I don’t earn any commisions from them, they’re just a good company I like).

Summary

So that is an introduction to how websites actually work. Obviously this is a huge topic and this article only scratched the surface, but hopefully, it was helpful! If you have any more questions about this topic (unless you’re a web developer, you probably do), please leave them in the comments!

This article was originally published on Citizen Affiliate.