Recently, an organization where I have been contracted, faced an issue with their existing CDN setup which was hosted by DigitalOcean Spaces CDN. I was assigned a task to tackle with the issue. During my early investigation, I came up with some issues which has caused total link down to their CDN from Pakistan.
What I figured out, in traditional CDN setup where data is cached on different edge locations, there exists some drawbacks:
1. If there is a single origin server then it lacks in data reliability and redundancy. If for some reason any CDN node is unreachable to origin then there will be no cache of data available for that region which results in 404 or 403 error, based on the type of server setup you have made.
2. If a user makes a first attempt to the resource and its not cached on its edge location, then a request will be made to the origin server which also makes a little delay in fetching data from different location.
3. If for some reason, a country link to the CDN is down then there will be no way to make an uplink to origin server even you can still have access to it. We faced this issue with DigitalOcean Spaces CDN recently, when our country link to CDN was unreachable but a link was not made to its origin server even we can access it directly. At last, I have to manually make a link to origin server with CNAME later.
To overcome these obstacles, an alternate solution was proposed based on Scalable Distributed File System (DFS) with Geo-Replication. In this proposed solution, we can also overcome data loss problem and have better data reliability and redundancy. In this architecture, each file is distributed among different nodes in cluster which can be located in different regions. This replication of data is done automatically. If for some reason any node becomes unavailable then data will still be accessible from another node, even if origin server is unavailable but recent data can still be accessible from edge locations. Another benefit is, we can easily extend the number of edge locations to cover additional countries.
Initially what I did, I first figured out where majority of our users are residing. For this, Google Analytics has helped me and I picked top 3 locations, US, UK and India. Next I setup 2 primary servers in US which were setup as DFS with 2 replicas. Then 2 additional servers were setup in UK and India which were mirror sites and they get auto-replicated from our DFS system in US. I have made this entire setup using GlusterFS. Here is the rough sketch of my final setup: (Excuse my lame sketch as I am not a big fan of using network diagram tools :))
After setting up GlusterFS and all the servers, there was a need of LB (Load Balancer) also so the traffic can be distributed to different servers and can also easily optimise the speed by selecting the nearest server based on user location. To ease this, I have used Cloudflare for LB as well for reverse proxy to protect our servers from DDoS etc. Cloudflare have their edge locations spread across the globe and their datacenters were also available in our country so it was our choice by luck. :)
Oh, one more thing. After completing this entire setup, there was also a need to centralize the monitoring of these servers, so I can monitor the resources in used by all the 4 servers and can upgrade them when needed. For this purpose a small server of 2vCPU and 2GB RAM was used with Prometheus and Grafana.
It has been about 1.5 months now and it is a very reliable solution so far with High-Availability, 99.9% uptime and redundancy.