How we built our tech stack at AntVoice
In this article we’ll be talking a little bit about AntVoice as a tech startup and then, about the tech stack we’re using (and why).
A Quick tour of the house
In a nutshell :
AntVoice is an AdTech company that offers a predictive-targeting solution and in banner personalized recommendations, all of this is backed by machine learning algos.
This basically means three things in a purely technical perspective :
- 💹 Real-time bidding
- 📊 (Big) Data Centricity
- ⚡ High performance and Scalability (up & down)
More in details :
As of today, our platform is handling on average some 15k requests per second, with a response time threshold of 50 milliseconds.
Yes, I know, it is (very) fast, indeed, and we’re proud of it !
We’re actually proud because during these 50ms we process a real time prediction and pricing computations based on all the data we gathered on a specific user and the Ad location. This involves multiple model evaluations. All of this is much more evolved than responding to requests with a basic algorithm.
Behind the scenes :
A bidding decision has to be taken automatically and accurately in order to buy an advertisement space at the best price for the right audience and track the results of this bidding, whether it is won or not, to feedback our recommendation and pricing [AI] algos.
That generates a bunch of data that is stored, processed and used everywhere.
This also implies that we need an elastic infrastructure able to adapt to the high and low picks of every minute of a day so that we don’t miss an opportunity when minimizing technical costs.
👀 Well that is a lot of things ! but rest assured that we’ll come back with other articles in the future to talk about this stuff in depth.
(Do the) Evolution :
AntVoice stack has evolved over the years in accordance to every change in its business model. What started as a product recommendation service is now a whole DSP (Demand Side Platform) solution offering clients a way to acquire new audiences thanks to predictive-targeting ads.
Many smart people (former and current developers; my wonderful colleagues 💖) contributed to build the ecosystem. Every-one brought a neat value to the project.
Despite the tech team size that has always been around a single two-pizza team , AntVoice’s tech stack has always been rich, thoughtful, and above all : alive ! - it continues to change.
Our tech stack
AntVoice’s tech stack evolved a lot during the last five years. We love to challenge it and test new technologies we could use to improve our platform. Here is our stack current snapshot, but be sure it will continue to change in the next few months.
We won’t go through deep explanations about all the choices we made as these will be developed in future articles.
NOTE : every mention of “on-premises” does mean a service is hosted or installed in one of our virtual machines (not a local one) — as opposed to any cloud service in “SaaS” mode.
Languages :
Programming languages. Some we “speak” fluently, and some we use less.
- C#
We happily use the latest version of it. It’s a very powerful general-purpose, multi-paradigm, and in my case, enjoyable to use language. It represents roughly 70% of our backend of our code base and has been there since the beginning. - F#
Introduced to improve modeling, and take advantage of functional programming.
We like it so much that our newest web APIs have been created with F#.
We could mix both of the .net languages in many parts of our code without having to rewrite / translate any code and it worked out pretty well. It represents some 25% of our backend code base. - Rust
The latest adopted language, needed for high performance.
As we reached some of the limits of vertical scaling with C# and F#, we looked for a more performance oriented language. After some POCs Rust won our confidence and interest, but not without many struggles to understand its philosophy and make it work. It represents about 5% of our backend code base. - TypeScript
The strongly typed and refactoring friendly web language.
We used it to create our tag library and we use it for every new frontend dev. - JavaScript
We’re talking here about plain old vanilla JS. It tends to disappear in front of TS. We can consider it as a legacy language in our stack. - Python
DataScientist’s best friend. Used also for some tooling tasks by developers. - Scala/Spark
Used by data science, when running spark clusters. - Go
Used in some prototypes but Rust was more adapted to our needs. In maintenance only. - Java
As a marginal usage, we have some lines of Java used by necessity in a dataflow job.
Local development environment :
Tools we use to make work done
- Ubuntu ~ focal fossa
We wanted to have the closest environment to the prod. Thus, we are 100% on Linux. Ubuntu is a user-friendly distribution with lots of support, the choice was easy to make. - Visual Studio Code
The most beloved editor ever (a totally unbiased description, haha) - Rider
Super powerful, full of great features, and works great in linux - IntelliJ
We use it especially for rust development but also for Scala/Spark too. - Jupiter Notebook
Used by our data scientist for model prototyping and statistical analysis. - Robot3T
“Who needs a graphical client ?” — said my team mate; well, for lazy bastards like me here’s a simple mongoDB graphical client - Postman
We are satisfied with the free version for each developer; - Terminal
Linux best friend
Dev Stack
What developers use to build our sweet platfom.
Frameworks
- .Net 5
We try to always be up-to-date with the latest improvements and .Net (core) frameworks as we’re using two of its languages.
And also because we’re full linux (dev & production), we always look for the most stable framework, and in this matter .Net 5 works like a charm. - Angular
We started using angular since the first versions. It works well, and we stick with it because it meets our (few) UI needs. - Asp.net core
Our framework for c# web APIs. Does it need an intro ? - Giraffe
Our framework for F# web APIs, knowing that we tend to favor F# for our new services. - REST
Even if this is neither an architecture nor a framework, I included it here as a ‘bag’ for all RESTful frameworks and libraries we’re using (as we also use Grpc) - Grpc
When it comes to performance, this google framework for remote procedure calls over http is a champion. Used for our low latency bidding services.
Logging
- ElasticSearch / Kibana
A standard in the logging world. We do not use logStash though. We’re happy with the on-premise service. - FluentD
Rather than manually building a specific logging layer for each of our hetegeronous projects we relied on the fluentD for the task of collecting and recording logs in a unified fashion.
Data
- Mongo
We only use no-sql data and MongoDb is the document-oriented database we chose. Mongo is well documented and accessible, but we tend to challenge it from time to time. We have a primary (master) server and a secondary (slave) one to handle failures. The data-duplication happens very often during the day. - Redis
We needed a simple and effective key-value / cache database, and Redis responded to these needs. We used it also for some pub/sub mechanism when dealing with configuration refresh for instance. But the more we look for performance, the more we go for AeroSpike. - AeroSpike
For everything related to caching and key-value storage, aerospike is a performance monster. We use it in-memory for super fast caching (for our bidding engine for instance , and with persistence for recommendation pre-calculations. - BigQuery
It’s a Serverless, highly scalable, and cost-effective multicloud data warehouse designed for business agility (described here).
The most important thing for us with bigQuery is to stream Tb of data without having to manage it. The Access time for data usage is pretty good. It’s also super easy to query thanks to an (almost) standard SQL query syntax.
All these advantages come with a cost (a money one) — a cost for storage, and a cost for usage. Thus, we have to be careful with our queries and partition keys.
Infrastructure
What we use as our global infrastructure.
We use a microservices architecture with services that communicate synchronously using REST and Grpc, and asynchronous thanks to messaging queues (pub/sub). We also use cron jobs and background tasks. We decided to go full Cloud for all the benefits cloud computing could bring.
GCloud
All is deployed on google public cloud. The choice was based on the pricing compared to AWS and Azure, but also on the Big Data and AI services offered by GCLoud. This is a non exhaustive list of the services we use the most :
- BigQuery
- Compute engine
- DataProc
- Kubernetes engine
- Pub/Sub
- DataFlow
- CDN
- Storage
Terraform
is our infra-as-code description language.With it we can create any google cloud resources we need, and keep track of every change in Git history.
Istio
It’s the service mesh we are using, to handle our microservices communications and traffic
Ansible
Ansible is the tool we are using to set up our managed VM (aerospike/mongo server for example). It creates disks, users, services, etc.
DevOps stack
What we use as a devOps to build, deploy and monitor.
Code
- Azure devops
The good thing is our plan is free thanks to our team size, and we like it.
We’ve been constantly customizing it to meet our needs. - Git
I think it’s the obvious choice for every code base nowadays.
Our repositories are in azure devops except one open source library we hosted in GitHub. - Mono-repository
This is neither a technology nor an architecture, it’s the choice to put all the code in one place, i.e to use a single technical solution / code repository for all the services rather than separate ones. This choice has advantages and drawbacks :
- The con is that we have to make a custom build and deploy process that selects what has changed (and we did it), and be careful of the bigger impacts of shared libraries changes.
- The pro is avoiding nuget hell and similar issues, but also making refactoring and developing way faster.
Build
- MakeFile
since the days of C programming this tool has never ceased to be used. We use it for all kinds of things, from build scripts to publish ones, to simple automation and ‘shortcut’ tools. - Fake
it’s MakeFile, made functional with F# . We have a custom library, built with Fake, used to detect the changes in our monorepo and deploy the adequate jobs and services.
Hopefully, We’ll detail this part soon enough in a future article.
Deployment
- Teamcity
We’ve just migrated to the 2020 version , on-premises. - Docker
the super-star container - Kubernetes
As we have a real need of horizontally scaling up and down we adopted kubernetes for all our deployments. - Helm
We used HELM as a higher-level kubernetes tool. Thanks to its Charts we created a few unified deployment templates where we inject variables per environment, to quickly create or update our services. - Apache Airflow
This tool is used especially by our data scientist to schedule and monitor the many workflows and computations we need to run every week for our pricing and recommendation engines.
Monitoring
- Kibana
We use some dashboards and follow log evolution with every deployment - Prometheus
Prometheus collects metrics by scrapping http endpoints in our applications. Some of these metrics are essential, even vital, to our business — for instance we have to monitor our spending trends for our bidding service, or banner click rates for our billing service. - Grafana
Awesome analytics visualization service that we use on-premises.
QA Stack
What we use on a daily basis to test and write automation tests
Unit testing
XUnit, Moq, Foq, AutoFixture and many other testings units testing libraries
Functional testing
Our browser tests are enclosed in containers. We use Selenium to orchestrate all our test suit along with docker-compose.
DataScience Stack
What our Data Science magicians use
As mentionned earlier, Python, Jupyter and AirFlow are the mains tools used for our data chef. One day, soon enough, we’ll dig deeper into this part.
Project management
What tools to keep eveyone busy and productive
Methodology
Like many before us we used to be on Scrum. But then we changed to ScrumBan in order to lighten our ceremonies and be task-focused : We work on many projects at the same time. We do have a Azure DevOps kanban dashboard, but we keep having daily stand-ups and retrospectives.
Communication
- Teams
teams is the communication platform we use. We work 80% of the time remotely, and teams works pretty well for video and screen sharing - although there many limitations in linux (like remote control) - Google Workspace
full cloud office tools work well for us.