Serverless simply works?

Nemanja Jovic (Neo)
Devjam
Published in
14 min readSep 24, 2021

Hello folks, this is a follow-up blog post about the Serverless Simply Works talk, presented by Neo at Sytac’s DevJam 2021 Conference.

We will answer a few questions and demystify why companies should, or not choose to use serverless computing, how it looks like to run serverless in the enterprise environment, check on pros and cons, required skills, considerations, and many bullet-points that are maybe not soo foreseeable.

This comes from practical experiences, many hours spent in discussions, improvements, decisions, problem-facing moments, and most importantly — learning.

Gartner estimates that between 2020 and 2025, 20% of organizations globally will migrate to serverless computing, on the other hand, nowadays modernization is not such an easy step for corporations.

Companies should definitely start asking questions on how they develop their products, taking into consideration — Monoliths V.S. Microservices.

Concept of Serverless in one word — abstraction

Serverless is a prestigious technology concept that is used by developers and consumed by other modular systems and/or by end-users, enabling people to think more about the application and worry less about infrastructure, and it always serves a business purpose.

It is powered by infrastructure that can be either a small or huge spanned system, it’s backed from behind by servers, storage, software-defined networks, set of datacenter replicated over regions and geographies.

In a nutshell, serverless or function as a service is aiming to take away everything except the code that you write, so that teams can focus primarily on application code development, but in reality — there is much more behind the scenes to make this statement enterprise-grade.

First of all, building serverless is not only implementing new architecture but also embracing and adopting a new mindset, paradigm around DevOps, agile, distributed systems, etc., shortly — big modern shift.

Serverless is a real cloud-native deployment model, it is a system orchestrated by the platform itself, it can automatically recruit the right amount of resources, such as CPU or memory without a lengthy period.

Companies usually opt for serverless in situations like

  • Decoupling systems from a monolith into microservices
  • Building highly modular solutions and incorporating external services
  • Expecting unpredictable fluctuating workloads, in either high or low usage
  • Processing largely distributed data sources

Many of the cloud providers provide free runs up to a certain number, afterwards you as a customer have to pay for the additional use cases.

The free runs go up to 1 million executions, different providers have different politics in terms of billing, some of them charge per run cycle, while some of them take into consideration runs, duration, and resources allocated during the run, however, it’s easy to experiment for a long period while keeping your bill at 0.

Depending on how mission-critical is the system, you can opt for consumption or premium plans, the purpose of serverless In the end is to minimize the TCO — Total cost of ownership.

Cloud platform of choice

All cloud platforms differ in functionality, performance, pricing, technology, and many other aspects, the choice that you can make will significantly impact the roadmap of your software development lifecycle, and company in general.

Taking into consideration the long-term commitment with cloud provider services, you should evaluate all potential aspects, biggest concerns can be, for example, data and the vendor lock-in.

Edge-Computing is all about reduction, it is a networking philosophy that brings computing close to the data source to minimize latency and bandwidth, it also saves on energy and power.

Edge-computing moves closer to the end-user, enabling them to experience faster response, while still maintaining the nature of cloud computing.

Edge-computing brings with it security implications since every single instance runs at the edge, it should be a small, cheap, and sandboxed computer.

In terms of serverless relation, we can use edge computing for security checks, document transformations, location routing, etc.

Once the code has been deployed, it will be distributed among many locations all over the world.

Cloudflare’s edge computing network is only 1ms away from all possible internet-facing users and it guarantees cold starts under 1ms.

As time is progressing, we see more applicable use cases for serverless computing running on the edge, making this hybrid into one big success.

Security & compliance

For many organizations, compliance is one of the critical aspects, as a company you want to make sure that the cloud provider of your choice already ensured that data flow and other security controls will help you to comply with regulations.

Many cloud providers offer compliance certifications, some of them specific to global regions, some of them are specific to industries — such as health, finance, government, education, etc.

Three general aspects of cloud security are

  • Physical — protecting physical assets in geo-locations, controlling physical entrance, etc.
  • Infrastructure security — ensuring security patches are regularly applied, ports are scanned for abnormal behavior, etc.
  • Data and access security — data encryption, controlling user privileges, etc.

Cloud providers have little control over the third aspect, most of the security breaches happen because 3rd party (user) is not very well secured.

Despite all efforts of the providers, some companies are still hesitant to entrust the cloud as a concept, as a company — you want to make sure that the cloud platform you are choosing will serve the best purpose and help you secure your workloads.

Engage product teams

As a customer, you can bring unique architectural designs that might be still unknown to the cloud provider, besides its maturity.

All cloud providers have in-house engineers that are ready to help out the customer, this service almost always brings additional costs, but I can assure you it is worth the price.

Before making the heavy lifting decision that can impact your business, it is wise to engage the cloud provider product team, it can help you to plan your journey and discover benefits or bottlenecks of the services that you are planning to use.

Target language supported

Nowadays most of the top-notch used languages are already supported by serverless service (FaaS), those include — Java, PowerShell, Golang, Node.js, C#, F#, Python, etc. but it is always wise to double-check if your desired language is supported.

Cloud providers seem to be competing in terms of supported languages, in case that there is a newly popular language on the market, the cloud provider will swiftly try to release it as a supported language.

Besides the language support, there are performance implications that you should consider, I can bet that certain languages will work faster or slower on different cloud platforms, make sure to do POC.

Targeted event sources supported

Event sources or triggers are custom events/data that invoke a serverless function.

Some of the event sources are storage, message queuing system, HTTP protocol, events from other cloud services, and so on.

As the serverless architectures are mostly event-driven, you as a customer want to make sure that the event source that you are aiming to use is supported.

Cold start time v.s. Always-on

Serverless architectures usually do not preserve the state, each function is completely independent of the others.

Usually, once the function is executed, the start and the end of execution are the complete life cycle of the service, thus the data will not be preserved in the function container, rather it will be deleted.

For some of the mission-critical systems, customers do not want to face the cold-start time, which is the actual time that it takes for the application to start, in case it hasn’t been used for a while.

Each cloud provider has recommendations per language that can help you to optimize the cold-start duration.

Your developers should write lightweight code, in case that function has heavy dependencies, it will take longer for the function to start serving requests.

You can also opt for dedicated premium plans, which will make sure that there are pre-warmed workers that will serve requests, but this kinda breaks the concept of serverless consumption.

Concurrency & scalability

Concurrency refers to the parallel number of execution that happens at any given time.

You can estimate the concurrent execution count, but this count will differ per type of event source that you used.

Every FaaS service supports dynamic scalability as a response to increased traffic bursts.

Different cloud providers have different concurrency limits, there are also differences in scaling intervals and modes, thus it is wise to perform a benchmark and make sure you are satisfied with the performance outcome.

Required skills

Till now we have been talking about velocity, and how serverless offloads management tasks from your shoulders, but in reality, there is still a long list of required skills needed to make serverless solutions enterprise-grade.

Infrastructure & network

Since every single microservice has to be constructed, deployed, and secured, the DevOps teams still have to have a solid understanding of infrastructure components and networks.

Employing professionals who are very familiar with the cloud, but not so familiar with the traditional onboarding of the services might bring some challenges, especially in hybrid scenarios when the whole solution is spanning more than one cloud provider, or also incorporates private cloud environments.

There is numerous tooling on the market that can help your team achieve infrastructure and network excellence, every single tool has pros and cons, every tool takes time to get familiar with, and the tooling choice can make a substantial difference when it comes to velocity, source code arrangement, etc.

Operating systems

Besides the abstraction layer that stands in between the operating system and the code, it is still required to have operating system understanding to troubleshoot the issues that might occur, like — network connectivity, application issues, checking logs over the filesystem.

Web Services

APIs are a common example of HTTP based serverless solutions, to either develop or support from the operational perspective of this entity, you have to have a solid understanding of web services

CI/CD

From the delivery perspective, CI/CD is a crucial workflow that will dictate part of velocity and quality. To develop a proper CI/CD solution that will work for itself, time is necessary, and there is always space for improvement.

CI/CD has many aspects, many best practices, and usually consumes time for the majority of people, nowadays it’s becoming easier to understand the concept since it is a brand, a few years back CI/CD was a myth — for some people still is.

Current research says that it takes approximately 3–5 years for a highly skilled company to mature its CI/CD workflow.

In reality, a lot of companies struggle because their requirements and projects velocity does not slow down, and their staff cant keep up quality and pace at the same time.

Application Testing

While adapting to DevOps in the modern era, many companies are incorporating test-driven software development.

As a good example, test-driven development should incorporate those tests in automated CI/CD, teams can sometimes struggle to incorporate this practice due to complexity.

As a best practice, tests should be incorporated into a framework, it can be a simple pipeline with a list of tasks that can execute automated testing against different targets.

Security management

When talking about security the list is pretty long, so I will try to summarise a few bullet points that touch the DevOps lifecycle of one serverless solution

Teams should have a solid understanding of security best practices to develop, secure, and maintain the serverless solution.

As a best practice companies should

  • Segregate environments and scope permissions among the resources, administrators, and consumers
  • Make sure that approval workflows are in place and that principle of least privilege across the system is applied
  • There is no secure data stored inside the repositories
  • Use secure stores/vaults whenever storing and consuming confidential data
  • Use software that can enhance secure coding practice at the developer workstation, and software that can perform SAST, IAST — SonarQube, Checkmarx & Contrast, etc.
  • Perform automated vulnerability testing against the application
  • Backup important deployment artifacts away from CI/CD system to enhance security from disaster
  • Aggregate logging into SIEM system for improved landscape visibility, heuristic knowledge, and incident response.

Application Development

And of course, you should have a great team of developers that can support the company vision forward, developers should be able to translate customer needs into the developed solution — successful software should mirror what customers imagined as the product of high value.

Company teams should be omnipresent basically, from the beginning till the end, they should work together to deliver and achieve the best quality, to foresee blind spots, and make smart decisions that will fit the scale because behind every door there could be a new challenge waiting.

Architecture planning

Before you even start building any service, it is very important to have a proper landing zone in the cloud.

The landing zone includes many components, such as

Scoped permission hierarchy, resources should be deployed into separate environments, where one env does not have access to other and vice-versa

The whole landing zone should be governed by policies, where you can state the company requirements in terms of security and operational excellence, for example — in which region your workloads can be deployed, what are minimal security requirements for the certain resources — for example mandatory ACLs, and so on

Naming convention, you should have an automated process in place that can generate naming for resources that will be deployed, rather than hardcoding anything. Naming convention should be solved on the company level because it can introduce very swiftly a lot of problems and overhead

Tagging strategy for billing purposes, you should tag your resources per company requirements, such as environment, project, year of deployment, and so on, to organize easily your bill at the end of the month, and see from where the biggest costs are coming

A network topology that will isolate workloads and make connectivity easy to other required services

An example of secure topology would be hub-and-spoke.

Data should be masked, classified and no unauthorized personnel should access the data, customer data should not be accessed even by the administrator

Connectivity to and from services should be secured, meaning that your backends never should be accessed directly, rather through multiple layers of security boundaries, such as WAF, firewall, and API gateways

You want to make sure that logs are visible and aggregated for the ease of audit and investigations, these are logs about your services and infrastructure, audit and activity logs, access and performance logs

In terms of authentication, non-authorized access should be prohibited, in case you are using API gateways, keys for the access to one backend should not be shared among customers, and the keys should not expose more than expected, all users and administrators should follow least-privilege principle, access to your resources should be restricted to-and-from certain IP ranges

All confidential and configuration data should be stored securely, access should be passwordless by leveraging identities rather than passwords, access to secure vaults where data is stored should be fine-grained

Your application should never contain any confidential data in its configuration or application files, rather it should reference secure data stored in the vault and fetch the secrets during the runtime

Your application should be well monitored, including technical and functional monitoring. Technical monitoring will make sure that all vital infrastructure components are under optimal resource consumption, while functional monitoring will make sure that your application serves the purpose, in the other words — you can test the application by sending some payload and receiving a proper response.

Last but not least, application change tracking should be enabled to track any suspicious or unexpected change, this will give you the ability to go back in time and see exactly who, where, and when to change the particular part of your application or infrastructure setting

Deploying your workloads

Dependencies among pipelines when segregating infrastructure and application deployment might introduce confusion, many pull requests all over the place, not all changes get promoted to production, this complexity grows in case of high pace and people being spread over multiple projects

OAT

After you consider that application is fully deployed, all changes in the master are deployed up to production, you should conduct an operational acceptance test. OAT serves the purpose of verifying that operational excellence is achieved, or so-called pre-release readiness, in the other words it’s non-functional testing of the product

All resources should be deployed to the proper region, have the proper size, naming, the expected amount of resources, reside incorrect virtual network, check for vulnerabilities and perform disaster recovery scenarios.

PAT

Production acceptance test demonstrates that the product fulfills the business requirements, the tester team should have a list of test cases that can be executed to determine product readiness.

A performance test should be conducted to determine if the application is handling the number of requests as expected.

PAT also ensures that the application will expectedly handle the faulty scenario, authentication should be terminated with proper error code for example, and not let the user log in to the system, closed firewall ports should block the connectivity, etc.

In case of alerts, your system should report this problem to appropriate personnel, perhaps engineer on duty. It’s a common scenario to make adjustments in the system and force it to fail, to verify that monitoring is implemented properly.

GO-LIVE!

After this stage, you want to make sure that every new release goes smooth with the high velocity, and that system keeps running smoothly.

Running your workloads

Blind spots

No matter how well the job is done, sometimes there are blind spots that we do not foresee at the very beginning of the project.

Companies that segregate their environments sometimes do not pay too much attention to lower environments. There might be times when resources at lower environments produce a lot of errors, due to their serverless nature. Imagine a time-based function that will be triggered every 5min, and forget it for one day, the accumulated amount of data will be big.

In case of platform maintenance or service health issues, it is always smart to have your serverless app replicated over multiple physical instances.

Some event sources know to behave abnormal, I have seen that event grid for example is delivering multiple events for one single dead lettered message

Key rotation

Most applications have authentication keys or connection strings that are used to authenticate and authorize the request from a consumer.

To prevent data exfiltration and achieve a higher level of security, it is advisable to rotate secure entities on a regular basis.

This is not a trivial task, because connection strings and keys are often used by other parts of the system, if secure entities are not stored centrally, rather in multiple places, this might create a management hell.

Endpoint monitoring

All application endpoints should be monitored, both technically and functionally. Running an application does not necessarily mean that it is a healthy application. It is your responsibility to supply monitoring requests on a regular basis and make sure the application is running while serving all required functionalities.

End to end health check

Most of the serverless solutions are interconnected microservices that comprise one bigger whole, it is not sufficient to test only one endpoint if perhaps the whole flow is not functional.

Most of the cloud providers have a recommendation for end-to-end health checks, the health check is usually implemented as another function endpoint, furthermore, the cloud platforms can automatically test this endpoint and provide end-to-end results.

Bill at the end of the month

Did the solution fulfill your requirements, can you optimize your costs or performance, what the new release will bring?

Serverless is a great technology that made a lot of life easier, it takes time to grasp it as a concept, as everything else, once your team is horizontally ready to use and manage this technology, your company will receive a lot of benefits out of it.

With love by Neo | Sytac

--

--

Nemanja Jovic (Neo)
Devjam
Writer for

Chapter Lead, Microsoft MCT, DevOps Domain Expert