Building a L7 Web Proxy on Google Cloud Platform (GCP) with Squid Proxy and ClamAV

Published in

Google Cloud - Community

7 min readMay 17, 2023

Securing internet access is crucial for cloud-based companies. Users of Google Cloud Platform (GCP) now have the option to use the Secure Web Proxy service for access control and website filtering — a topic we’ll cover more deeply in a later piece. Today, however, our focus is on the application of open-source technologies, specifically Squid and ClamAV, as an cost-effective and dependable way to safeguard web proxies on the cloud.

In this article, we will provide step-by-step instructions on constructing a layer 7 (application) web proxy on Google Cloud from scratch by leveraging Squid and ClamAV open-source tools and Compute Engine services. But let’s staet with the fundamentals.

What is Squid Proxy ?

A Squid proxy, also known as Squid, is an open-source, high-performance proxy server and web cache daemon. It is primarily used as an intermediary between clients and the internet, improving performance and security by caching and serving frequently requested web content, reducing bandwidth usage, and decreasing latency.

What is ClamAV eCAP?

ClamAV is a popular open-source antivirus engine for detecting trojans, viruses, malware, and other malicious threats. It is typically used in various situations including email scanning, web scanning, and endpoint security.

eCAP is an acronym for “Extensible Content Adaptation Protocol”. It is an architecture that allows HTTP and similar protocols to be extended and modified for content adaptation and filtering services. It is used by proxies and other intermediaries that manipulate protocol messages on their way between clients and servers.

“ClamAV eCAP” refers to an integration of ClamAV with the eCAP protocol, it means using ClamAV as a service for an eCAP-capable proxy to perform virus scanning on the traffic passing through the proxy. In our case, it is used to scan all downloaded files for viruses using ClamAV.

Design

The diagram illustrates a network architecture that features a central hub VPC (Virtual Private Cloud) network connected to multiple spoke VPC networks. The connection between these networks is established using VPC Network Peering, which requires no additional configuration, this design can support ingress and egress traffic. In case your need to create an egress-only environnement, you can replace peering with a PSC (Private Service Connect).

Within the hub VPC network, a “Web Proxy” is hosted in a managed instance group, which is strategically placed behind an internal TCP/UDP load balancer to efficiently distribute network traffic. Additionally, a basic client virtual machine is employed for testing purposes within this network architecture.

Limitations

As this solution is not managed, you’ll need to shoulder the responsibility for system and software updates. This means staying aware of new patches or updates that need to be applied, as well as managing the general upkeep of your systems.
In terms of integration with Google Cloud services, the scope is limited since this is a third-party solution. Managing the ‘allow’ and ‘block’ lists for websites is not a straightforward task, and may require more effort and technical understanding than typical Google Cloud services.
The scalability of this solution is also a critical factor to consider. While auto-scalability is a feature you can build, it must be done so with high availability and system capacities in mind. This is especially true for large-scale systems, where there’s a potential to reach the limits of simultaneous clients.

How to build this design ?

In the subsequent sections, we are going to delve deeper into the procedure involved in constructing our desired design. We will unfold this process step by step, elaborating on each aspect in detail to provide a comprehensive understanding of how to achieve our target design.

Step 0: We will start with setting the necessary env variables, this will facilitate installation steps

export PROJECT_ID="your-project-id"
export REGION="your-region" # ex: europe-west3

export HUB_NETWORK_NAME="hub-network"
export HUB_SUBNET_NAME="hub-subnet"
export SPOKE_NETWORK_NAME="spoke1-network"
export SPOKE_SUBNET_NAME="spoke1-subnet"

export TEMPLATE_NAME="l7-web-proxy-tmpl"
export MIG_NAME="l7-web-proxy-mig"
export LOAD_BALANCER_NAME="l7-web-proxy-lb"

Step 1: To begin, it is necessary to establish a prescribed network structure consisting of a central Hub network and at least one connected Spoke networks.

# Create a Hub custom Network and its subnet
gcloud compute networks create $HUB_NETWORK_NAME \
   --project=$PROJECT_ID \
   --subnet-mode=custom
gcloud compute networks subnets create $HUB_SUBNET_NAME \
   --project=$PROJECT_ID \
   --network=$HUB_NETWORK_NAME \
   --range=192.168.0.0/24 --region=$REGION

# Create a Spoke custom Network and its subnet
gcloud compute networks create $SPOKE_NETWORK_NAME \
   --project=$PROJECT_ID \
   --subnet-mode=custom
gcloud compute networks subnets create $SPOKE_SUBNET_NAME \
   --project=$PROJECT_ID \
   --network=$SPOKE_NETWORK_NAME \
   --range=10.0.1.0/24 --region=$REGION

# Delete default internet gateway Route
ROUTE_NAME=$(gcloud compute routes list --filter="network: $SPOKE_NETWORK_NAME AND nextHopGateway:default-internet-gateway" --format="value(name)")
gcloud compute routes delete $ROUTE_NAME --quiet

Step 2 : Next, we need to establish network peering between the Hub network and each of the Spoke networks.

# Hub to spoke 1
gcloud compute networks peerings create hub-to-spoke \
   --project=$PROJECT_ID \
   --network=$HUB_NETWORK_NAME --peer-network=$SPOKE_NETWORK_NAME \
   --auto-create-routes
gcloud compute networks peerings create spoke-to-hub \
   --project=$PROJECT_ID \
   --network=$SPOKE_NETWORK_NAME --peer-network=$HUB_NETWORK_NAME \
   --auto-create-routes

The result must be something like this:

Step 3: This is the pivotal stage where the core operational setup transpires. You will need to create a cloud-init file (for example, named startup.yml), which will contain the necessary installation instructions:
1 — Initially, updating your system is a fundamental step to ensure you’re working with the latest security patches and software enhancements.
2 — Following the system update, the next step is the installation of Squid 5, this version offers HTTPS Decryption, significantly enhancing the visibility of encrypted traffic for security analysis and possible malware detection.
3 — Subsequently, you will need to install ClamAV and its accompanying ClamAV Squid Adapter. This combination ensures that your proxy is equipped to scan and defend against malicious software, enhancing the overall security of your web navigation.
4 — Once ClamAV is set up, you will proceed to update the Squid configuration, modifying the settings to permit access from the internal network. This step is critical to enable the proper functioning of the proxy within your existing network.
5 — The final step is to restart the Squid proxy. This ensures that all the updates and installations you’ve done are integrated into the system and the proxy operates with the new configurations and enhancements.

#cloud-config
runcmd:
  - add-apt-repository universe
  - apt update
  # Install Squid 5 with HTTPS Decryption
  - curl -L https://raw.githubusercontent.com/belgacem-io/gcp-secure-web-proxy/main/modules/gcp_squid_proxy/files/squid.sh | bash
  # Install clamav and clamav squid Adapter
  - curl -L https://raw.githubusercontent.com/belgacem-io/gcp-secure-web-proxy/main/modules/gcp_squid_proxy/files/clamav.sh | bash
  - systemctl restart squid
  - sysctl -p

Step 4: Create a compute engine template, make sure to specify the network and the the path to the cloud-init file as metadata. Create managed instance group based on the created template.

Note : This installation is only compatible with Ubuntu 2004 LTS

# Create the instance template
gcloud compute instance-templates create $TEMPLATE_NAME \
    --project=$PROJECT_ID \
    --region=$REGION \
    --machine-type=e2-medium \
    --network=$HUB_NETWORK_NAME \
    --subnet=$HUB_SUBNET_NAME \
    --image-project=ubuntu-os-cloud \
    --image-family=ubuntu-minimal-2004-lts \
    --tags l7-web-proxy \
    --metadata user-data="$(cat startup.yml)",enable-oslogin=TRUE

#Create a managed instance group using the template
gcloud compute instance-groups managed create $MIG_NAME \
    --project=$PROJECT_ID \
    --region=$REGION \
    --base-instance-name=$MIG_NAME \
    --template=$TEMPLATE_NAME \
    --size=2 --region=$REGION

Managed Instance Group for Secure Web Proxy

Step 5: Create an Internal TCP/UDP Load Balancer with autoscaling capabilities. This must be achieved by utilizing the previously created managed group.

# Create Health check
gcloud compute health-checks create tcp $LOAD_BALANCER_NAME \
   --project=$PROJECT_ID \
   --region=$REGION \
   --port 3128
# Create load balancer
gcloud compute backend-services create $LOAD_BALANCER_NAME \
    --project=$PROJECT_ID \
    --protocol=TCP \
    --region=$REGION \
    --load-balancing-scheme=INTERNAL \
    --health-checks-region=$REGION \
    --health-checks=$LOAD_BALANCER_NAME
gcloud compute backend-services add-backend $LOAD_BALANCER_NAME \
    --project=$PROJECT_ID \
    --region=$REGION \
    --instance-group=$MIG_NAME

 gcloud compute forwarding-rules create $LOAD_BALANCER_NAME \
    --project=$PROJECT_ID \
    --region=$REGION \
    --load-balancing-scheme=internal \
    --subnet=$HUB_SUBNET_NAME \
    --backend-service=$LOAD_BALANCER_NAME \
    --ip-protocol=TCP \
    --ports=3128

Step 6: Create firewall rules to allow trafic from private VPC and Google health check cidr blocks.

# Create Helth check firewall rule
gcloud compute firewall-rules create fw-allow-health-checks \
    --network=$HUB_NETWORK_NAME \
    --action=ALLOW \
    --direction=INGRESS \
    --source-ranges=35.191.0.0/16,130.211.0.0/22 \
    --target-tags=l7-web-proxy \
    --rules=tcp:3128
# Create private firewall rule
gcloud compute firewall-rules create fw-allow-private \
    --network=$HUB_NETWORK_NAME \
    --action=ALLOW \
    --direction=INGRESS \
    --source-ranges=192.168.0.0/24,10.0.1.0/24 \
    --target-tags=l7-web-proxy \
    --rules=tcp:3128,tcp:3126,tcp:3127

Test and validate this design ?

Testing and validation of a network design are critical steps. To accomplish this, a virtual machine (VM) can be installed within the spoke network, acting as a representative client for the network’s endpoints. By utilizing the ‘curl’ command, it is possible to simulate internet access and evaluate the network’s ability to establish connections, route traffic, and resolve DNS queries.

Step 1: Create a client virtual machine within the spoke network

gcloud compute instances create client-vm \
    --project=$PROJECT_ID \
    --zone=${REGION}-a \
    --machine-type=e2-medium \
    --network=$SPOKE_NETWORK_NAME \
    --subnet=$SPOKE_SUBNET_NAME \
    --tags client-vm --metadata enable-oslogin=TRUE

Step 2: At this stage, it’s essential to establish an SSH connection to the Virtual Machine (VM) you’ve recently set up. This can be achieved by simply utilizing the “ssh” button found in the console. To verify internet connectivity through the web proxy you’ve established, retrieve the IP address of the load balancer and use it to execute the following command.

curl --proxy http://$LOAD_BALANCER_IP:3128 https://www.google.com

Conclusion

In conclusion, we explored the process of building a Secure Web Proxy using Squid Proxy, ClamAV and Google Cloud’s Virtual machines. Here a fully working Terraform example of how to create all the required components, including a transit VPC, spoke VPCs, firewall rules, and virtual machines.

Originally published at https://hassene.belgacem.io .