Tricks with forward proxy, HaProxy, Squid and a bit more.

Hello there, in this short article I want to combine some different fragments of experience about using a forward proxies.

Excluding the enterprise segment, most popular mention of word “proxy” for now is a Reverse Proxy. In fact, you’ll meet a lot of documents how to configure Nginx or HaProxy or even Apache in the Reverse Proxy mode, for the load balacing and so on. But in some specific kind of tasks you’ll still need to make a lot’s of requests to the specific web servers, for example in SEO or web crawling tasks. And you may be sure, that the “victims” of your interest would not like this. Very soon your IP address will be banned in some way.

At this phase external http proxies coming out. There is lot of different proxy services exist, that can help you to convert your money into some count of proxy IP’s.

Ok, after we got some pool of external proxy IP’s, we may want to use them. Very often the application you want to use, provide only one proxy address in configuration, but you have hundreds or thousands IP’s. In this case you need to run some external program that will listen on one specific IP and then send your requests to outside using this list of proxy IP’s, to split your requests.

And as I said, there comes a forward proxies.

  • HaProxy

Well known modern project, with the word “proxy” in program name, that give us a chance to use it like a forward proxy, maybe not so powerful like canonical Squid but faster and lighter instead. There a short example of how to use HaProxy like a forward proxy.

Most of the external proxy services will sold you proxy IP’s with basic auth so you need to convert the given login/password in to base64 encoding. Then our HaProxy will use it for authentication.

# echo -n "admin:admin" | base64 

Also we’ll enable a HaProxy statistic on 9999 port, for checking how it goes.

HaProxy haproxy.conf example:

#Forward HaProxy Config 

maxconn 256

mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms

listen stats
bind :9999
stats enable
stats hide-version
stats uri /stats
stats auth admin:admin123

frontend proxy_in
http-request set-header Proxy-Authorization "Basic YWRtaW46YWRtaW4="
use_backend proxies_out

backend proxies_out
cookie SERVERID insert indirect nocache
option httpclose
option forwardfor header X-Client
balance roundrobin
mode http
server ip-1 x.x.x.1:3128
server ip-2 x.x.x.2:3128
server ip-3 x.x.x.3:3128
server ip-4 x.x.x.4:3128
server ip-100 x.x.x.100:3128

As you can see, the configuration file a quite easy, there is a three main sections. A “listen stats” for the web statistic, a “frontend” section contains a internal IP where HaProxy will listen for requests and a basic authentication also. The “backend” section contains a list of external proxy IPs and “balance roundrobin” for the better IPs utilization.

Now will run it in Docker and test:

# docker run -d -v /opt/haproxy.conf:/usr/local/etc/haproxy/haproxy.cfg --name haproxy -p 8888:8888 -p 9999:9999 haproxy

Check that haproxy started and listen on specified ports:

# docker ps 
83ef26b94689 haproxy "/docker-entrypoint.…" 14 seconds ago Up 13 seconds>8888/tcp,>9999/tcp haproxy

Then run short command and see how it works:

# http_proxy="http://your-server-ip:8888" wget -O - -q
# http_proxy="http://your-server-ip:8888" wget -O - -q
# http_proxy="http://your-server-ip:8888" wget -O - -q

It works perfect, any new request will be sending by the next external proxy IP, the “round robin” in action. Also this a very light and fast solution especially with deployment using docker.

Open in browser “your_server :9999/stats” to take a look a statistic of proxy IPs utilization, also you can check the status of external proxy IPs for example.

Ok that’s all for HaProxy now, let’s move on and look other solution.

  • Squid

The oldest and most powerful proxy solution indeed, with extremely big amount of configuration options. Also a bit more system resources requirements, but as I said it compensated by opportunities.

let’s do the same using Squid, there is an example of squid.conf:

http_port 3128
hierarchy_stoplist cgi-bin ?           
acl QUERY urlpath_regex cgi-bin \?
no_cache deny QUERY
cache deny all
visible_hostname a
auth_param basic children 5
auth_param basic realm Squid proxy-caching web server
auth_param basic credentialsttl 2 hours
acl localnet src     # RFC1918 possible internal network
acl localnet src # RFC1918 possible internal network
acl localnet src # RFC1918 possible internal network
acl localnet src fc00::/7 # RFC 4193 local private network range
acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) machines
acl smtp port 25
acl smtps port 465
http_access deny CONNECT smtp
http_access deny CONNECT smtps
http_access allow localhost manager
http_access deny manager
http_access allow localnet
http_access allow localhost
client_persistent_connections off
server_persistent_connections off
# Remove identifying headers
request_header_access Cache-Control deny all
request_header_access Via deny all
request_header_access X-Forwarded-For deny all
# List of external proxies
cache_peer x.x.x.1 parent 3128 0 round-robin no-query login=admin:admin
cache_peer x.x.x.2 parent 3128 0 round-robin no-query login=admin:admin
cache_peer x.x.x.3 parent 3128 0 round-robin no-query
cache_peer x.x.x.100 parent 3128 0 round-robin no-query login=admin:admin

and let’s start it:

# docker run -d --name squid -p 3128:3128 -v /opt/squid.conf:/etc/squid/squid.conf sameersbn/squid:3.5.27

check that Squid container is ok:

root@kube3:/opt# docker ps 
7522778ba7a5 sameersbn/squid:3.5.27 "/sbin/" 2 hours ago Up 2 hours>3128/tcp squid

and test it out:

# http_proxy="" wget -O - -q
# http_proxy="" wget -O - -q
# http_proxy="" wget -O - -q

And everything works fine too, now you got a single IP that will split all your requests by the configured external proxy IPs. And we’re ready to buy all tickets for that rock concert.

Also don’t forget to configure the basic auth for Squid or HaProxy if you’ll run it on public IP, to prevent unauthorized access.

Ok, that’s all for now, in next part of the article we’ll configure our own forward proxies using a pool of IPv4 IPs.

Good luck.