How Cdiscount is shifting gears by overhauling its core services
Revising our infrastructure layer for even more performance
At Cdiscount, we are convinced that performance comes from speed. The average time between two production changes on our platform currently takes around seven minutes, and we continue to accelerate: our ability to industrialise our changes is crucial to us. The faster we can deliver infrastructure, the more value we provide to our clients.
Our technology ecosystem counts more than 5,000 servers, and we decided to completely revise our DDI core service infrastructure layer, which plays an essential part in delivering infrastructure, improving performance and making things easier and more secure.
Remember, DDI is an acronym for three critical network infrastructure services:
· DNS = Domain Name Server
Translates a domain name into an IP address
· DHCP = Dynamic Host Configuration Protocol
Allows a new device to join a computer network by assigning it a dynamic IP address
· IPAM (IP Address Management)
Schedules, assigns and manages the use of IP addresses and simultaneously coordinates DHCP and DNS operations
Why change our legacy system?
The legacy DDI infrastructure that has been in place in the company for years has now reached its limits:
· Technology types vary and entail related compatibility problems (Windows/Linux)
· There is no IPAM <-> DNS/DHCP synchronisation, leading to twice as many configurations and human error
· Propagation times are long, causing desynchronisation
· Few to no reporting/warning tools, making production monitoring uncertain
All these points reinforced our determination to choose a new solution that could respond to these problems.
What have we chosen, and how are we making the change?
We have chosen the Efficient IP solution with its SolidServer range of appliances dedicated to managing the IP address lifecycle, provisioning them and organising their deployment and monitoring.
The main benefits we expect from this solution are:
· High availability and security of DDI services
· Better productivity through automation
· Simpler management and minimised operating costs
· A global view of all the data
A redundant and resilient architecture
The architecture that we have set up is comprised of virtual appliances hosted on Microsoft’s Azure Cloud as well as physical appliances.
Our entire architecture is ‘high availability’-oriented and preferably multi-master between our two datacentres. The equipment is split equally between our two datacentres in Bordeaux and Paris so that the infrastructure is both inter- and extra-centre redundant, protecting ourselves against a regional disaster.
Two machines act as manager for all the other appliances: this equipment is dedicated to management and the IPAM module, and they are not intended for DNS resolution. Furthermore, connections with the DNS/DHCP servers are dedicated for management and data transfer to maximise availability.
This architecture provides:
· A robust and secure architecture by design
· An extended and unified management for end-to-end control over our DNS and DHCP infrastructure
In this configuration, all the data is propagated to all the servers, which avoids depending on a single source server, reducing the probability of failures.
Automating infrastructure releases with Ansible
One of the concepts that attracted us most to the EfficientIP solution is the ability to automate releases. Each server no longer needs to be manually configured in order to build the DNS/DHCP architecture, since the entire process is now performed automatically.
The second significant issue in this overhaul was the interoperability of these core services with our continuous infrastructure delivery workflows.
The base solution is provided with a complete set of APIs and industrialisation connectors such as Terraform, Ansible, Chef and Puppet, which allowed us to interface with our release tools.
Whereas before, there were multiple actions to perform via the various components (IPAM, DNS, DHCP). Sometimes these were manual and required human verification, but today we have an automated reference base that is clean and error-free. Our dev teams are served more quickly, and we can streamline the existing infrastructure with fewer risks of manual errors.
Using the Ansible module to automate reservations
|#################################################Date: 24/11/2021 #################################################### - pip install requests### - pip install jinja2### - pip install pyyaml### git clone https://github.com/acoston/Ansible-EfficientIP into /etc/ansible/modules### then vim /etc/ansible/ansible.cfg### library = /etc/ansible/modules### module_utils = /etc/ansible/modules/Ansible-EfficientIP## find subnet_id- name: list subnet of a spaceno_log: trueeip:ipm_server: "{{ ipm_server }}"ipm_username: "{{ ipm_username }}"ipm_password: "{{ ipm_password }}"ipm_action: ip_subnet_listipm_space: "{{DC|upper}}"register: subnet- set_fact:subnet: "{{ subnet.result.output | selectattr('ipm_subnet', 'contains', vlan|string)| list }}"- fail:msg: "The subnet (vlan) is not part of the target, see the list above."when: subnet|length < 1- set_fact:subnet_id: "{{ subnet.0.ipm_subnet_id }}"- debug:msg:- "{{DC|upper}}"- "{{ vlan }}"- "{{ subnet_id }}"- name: find one free IP address on a subneteip:ipm_server: "{{ ipm_server }}"ipm_username: "{{ ipm_username }}"ipm_password: "{{ ipm_password }}"ipm_action: ip_address_find_freeipm_subnet_id: "{{subnet_id}}"register: freeip- set_fact:ip: "{{freeip.result.output}}"- fail:msg: "ip not available on this vlan"when: ip == 'no data'- debug:msg: "ip used: {{ip}}"- set_fact:mac0: "{{ '52:54:00' | community.general.random_mac(seed=inventory_hostname) }}"when: formulaire == 'azure' or formulaire == 'nutanix' or formulaire == 'vmware'- fail:msg: "No MAC addresses found"when: mac0 is not defined- name: add IP on spaceeip:ipm_server: "{{ ipm_server }}"ipm_username: "{{ ipm_username }}"ipm_password: "{{ ipm_password }}"ipm_action: ip_address_addipm_space: "{{DC|upper}}"ipm_hostname: "{{inventory_hostname.split('.')[0]|upper}}.{{domain}}"ipm_macaddr: "{{mac0|trim}}"ipm_subnet_id: "{{subnet_id}}"ipm_classparam: "ticket={{issue_demand}}&hostname={{inventory_hostname.split('.')[0]|upper}}&domain={{domain}}"ipm_hostaddr: "{{ip}}"register: add#ignore_errors: true- debug:var: add
Security is improved
The DNS module also includes a DNS Firewall feature.
The block policy entails the creation of a Response Policy Zone (RPZ) that can prevent or redirect the DNS clients that access certain websites.
When a client queries a domain, a subdomain or an IP address that appears in one of the RPZ zones, the server uses the configured response policy to respond.
To date, this feature has allowed us to block thousands of malicious domain names, providing a first layer of security to our users.
Conclusion
The complete overhaul of the DDI infrastructure bricks is a colossal and complex undertaking, since all the IT systems depend on it.
After a few months of use, the time saved is considerable in releasing, accelerating and delivering our infrastructure, which is invaluable support for the rollout of our Octopia program.
The next steps will allow us to shift the DHCP roles of our remote sites, such as warehouses and offices, to the datacentre DDI infrastructures and then provide an approval workflow and API in order to make our tech teams more self-sufficient. All with the goal of ever greater speed!