I started my journey with Azure Api Management (APIM) in spring 2019. My client had an API strategy in place and needed someone to take it forward i.e. plan the architecture and start implementing the infrastructure (as code, naturally) and publish the first APIs using it. The setup needed to be as secure as possible but at the same time serve the needs of different teams and stakeholders, enable hybrid networking scenarios and integrate into a 3rd party IAM (SaaS) product. I really had no idea where I was putting myself into.
One of the most valuable lessons I learned during this exercise was figuring out how to apply a CI/CD process for APIs published through APIM. It took several iterations and learning the hard way, so I figured to share the story.
APIM offers several different pricing tiers to choose from. But if you’re after true network isolation (=Virtual Network support) the only tiers supporting this are Developer & Premium. This leads, in most cases, into a centralized model where all the teams are publishing their APIs through a shared production APIM environment. APIM in Premium tier costs over 2k€ / instance / month so minimum setup is 2 instances, although they reside in the same region by default. To have true HA for your APIM setup requires some extra planning and work.
APIM can be run in three modes
- Public: open to internet and Azure controls the DNS
- External: open to internet and Azure controls the DNS + VNET integration
- Internal: no connection to internet, you manage the DNS + VNET integration
Azure Api Management is basically a managed high level API gateway, a facade for your APIs. It involves substantial costs in terms of extra work to manage and run it. You can survive with much cheaper running costs if VNET support is not a must and most companies go with the Standard tier in production, I’ve heard. It just means they use some means of authentication to protect the endpoints in APIM but they’re all open to public network.
Then why would I want to publish my APIs through APIM and not just implement x number of custom API gateways to achieve the same? The main advantage it gives you is the ability to easily share the APIs through this facade to partners and 3rd parties as well as internal developers in a centralized and controlled manner.
For instance, you can enforce policies and group APIs under “products” to which consumers can then subscribe to. This is usually relevant to larger Enterprises and service providers building digital platform capabilities and API economy.
APIs published through APIM can be hosted virtually anywhere. In our case however, we were running it in internal mode. Therefore it was desirable to keep the traffic inside Azure backbone when forwarding it between APIM and APIs. For routing traffic from public network to APIM (internal) you need some service in front of it. In Azure’s managed offering this is usually Application Gateway (AppGw).
When planning for the needed subnets and IP ranges you need to take into account at least the following
- How many centralized (not owned by projects and/or teams) APIM environments you intend to have
- Max instance count in AppGw needed to support the highest user loads
These two are the biggest factors in defining the CIDR needed for your centralized APIM environments. These also account for only one region.
Another challenge related to this topic is to what Azure network should you place these resources. I spent fair amount of time myself weighing different options and came into the conclusion that it’s easiest and most efficient to provision them directly under Hub’s own VNET.
In the Hub and Spoke network topology, resources under the Hub’s network are automatically connected to Spoke’s resources. I didn’t want to start replicating the VNET peerings to some other VNET just to complicate things, not to mention the added round trips and needed firewall openings.
Infrastructure As Code
My philosophy when it comes to IaC is quite simple. Use the native tooling that the platform provides. In Azure this translates to ARM. I did take a serious look into Terraform also when investigating the options, but decided to stick with ARM, mainly because I was under time pressure, had better experience on it, and all the available examples were using it.
As usual, you also need some scripting language to make the last mile. I used both Azure CLI and PowerShell Core when convenient. With some of the newest services PS is the only thing that works from day one really.
I had a rough start however, because I found to my disappointment that the ARM template you export from existing APIM environment cannot be used as is, even though you’d strip all the extra stuff away, it still doesn’t work. Luckily there are some vanilla examples in the Azure Api Management DevOps Resource Kit to get started.
After I’d automated the creation and update of APIM environments I started plugging in the first APIs. I used the Extractor tool for generating ARM templates from existing APIM (dev) environments where teams had published their APIs initially. The tool had some shortcomings which one had to manually fix afterwards but things were progressing, as long as I worked the ARM myself.. but it was soon becoming clear that teams would need a lot of support using this approach. Most of the devs didn’t have previous experience from ARM and they even struggled figuring out how to work with APIM. It was time for some changes.
You build it, you run it
I had heard about this company called VIPPS in Norway who had a very similar situation. After watching their presentation on this it hit me. There’s no way this is going to work if I try to do the work for the teams/projects when they should be in fact owning the APIs life cycle and the APIM part that involves it. I’m only making myself a bottleneck and hindering others taking ownership of their APIs. In practice this means the API’s schema and APIM policy needs to live in the same code repository where the API is developed. And teams will handle publishing (syncing) the API schema changes to APIM as part of their regular CI/CD process.
I started working towards this new goal and having discussions with tech leads from different projects. We came into a mutual understanding of how the CI/CD flow for deploying APIs to Azure Api Management should work
- Developer commits new code to repository through pull request reviewed and approved by another developer. This triggers a new build in the CI pipeline through a web hook.
- All the stages in the CI/CD pipeline are run inside a container. The container image can vary between stages and is fetched from a private image repository.
- If/when the test stage is passed, it’s time to validate the API schema. Schema should be generated dynamically during the build stage and passed to later stages using build artifact. If/when validation is passed the API schema is stored to version control.
- Deploy stage takes care of deploying the latest code, output from build stage, to the actual environments where the API is hosted. If blue-green deployment is used, the new code would be deployed to staging slots here.
- Publish stage updates the latest API schema to APIM environments. Before this happens though, we should check from code repository if the checksum for api schema has changed. If the APIs schema has not changed since previous build(s) there’s no need to update the API in APIM environments.
- After the API CI/CD pipeline has run through successfully it’s time for the developers to take charge and ensure everything is running as expected. In order to monitor the centralized APIM environments’ metrics they need read permissions (RBAC) to those but only the credentials used in CI/CD have enough privileges to update the APIs.
Working with Azure Api Management is team work. It involves coordination with all third parties as well as internal IT and development teams. Teaching and coaching the teams plays a crucial role. Ideally a DevOps team owns the centralized APIM environments and provides tooling and guidance for teams/projects in regards to implementing the publish stage to APIM as part of their existing CI/CD pipelines in a common way and establishing operating models to meet SLAs and end-to-end monitoring.