Deploying ASP.NET Core Applications to Kubernetes with 0 Downtime

Hugo Woodiwiss
Just Eat Takeaway-tech
4 min readSep 14, 2023

📖 Some Background

We recently migrated one of our applications from AWS EC2 to EKS, and for the most part it’s been wonderful.

Release times are down from 40+ minutes per environment to 2–3 minutes, tightening our Pull Request (PR) testing feedback loop. It has been great for productivity, developer experience and elasticity, but… (When is there not a but when things seem to be going so well?!)

503 Service Unavailable responses during a deployment. Also reflecting my blood pressure when I first saw this graph..

During the migration process, we noticed that for each deployment, a handful of requests were being dropped as 503 Service Unavailable at our Kubernetes ingress.

😈 The Culprit

To shut down a pod the Kubernetes control plane sends a SIGTERM to the application container, and at the same time it sends a request to the ingress controller (in our case, Istio ingress) to remove the application pod from load balancing.

Upon receiving SIGTERM, ASP.NET Core by default instantly stops accepting new connections. This means that there is a short time where the app container won’t accept new connections, and the ingress controller is still trying to send traffic to the app container, which will be refused by ASP.NET Core. This causes the ingress controller to respond to consumers with Service Unavailable, and in the case of the application in question, this meant 503s directly in the browser 😱.

💻 The Fix

To avoid this, an application can be configured with a custom lifetime that delays stopping the application when it handles a POSIX quitting signal; SIGINT, SIGTERM or SIGQUIT. This allows time for the ingress controller to update its routes, and should prevent requests hitting an application that will refuse them.

An example of this is available, although not well advertised, in the dotnet-docker repository, which we modified to form our custom IHostLifetime:

/// <summary>
/// A host lifetime implementation which delays shutdown for a delay after a shutdown signal is received.
/// This is to allow ASP to still handle new requests after it receives SIGTERM from K8S, to give the
/// K8S ingress controller time to remove the pod from use.
///
/// Adapted from <see cref="Microsoft.Extensions.Hosting.Internal.ConsoleLifetime"/>.
/// </summary>
public sealed partial class DelayedShutdownConsoleHostLifetime : IHostLifetime, IDisposable
{
private readonly IHostApplicationLifetime _applicationLifetime;
private readonly ConsoleLifetimeOptions _options;
private readonly IHostEnvironment _environment;
private readonly HostOptions _hostOptions;
private readonly ILogger _logger;

private CancellationTokenRegistration _applicationStartedRegistration;
private CancellationTokenRegistration _applicationStoppingRegistration;
private IDisposable[]? _stopSignalRegistrations;
private IDisposable? _shutdownDelayTimer;

public DelayedShutdownConsoleHostLifetime(
IOptions<ConsoleLifetimeOptions> options,
IHostEnvironment environment,
IHostApplicationLifetime applicationLifetime,
IOptions<HostOptions> hostOptions,
ILoggerFactory loggerFactory)
{
ArgumentNullException.ThrowIfNull(options?.Value, nameof(options));
ArgumentNullException.ThrowIfNull(applicationLifetime);
ArgumentNullException.ThrowIfNull(environment);
ArgumentNullException.ThrowIfNull(hostOptions?.Value, nameof(hostOptions));
ArgumentNullException.ThrowIfNull(loggerFactory);

_options = options.Value;
_environment = environment;
_applicationLifetime = applicationLifetime;
_hostOptions = hostOptions.Value;
_logger = loggerFactory.CreateLogger("Microsoft.Hosting.Lifetime");
}

public Task StopAsync(CancellationToken cancellationToken)
=> Task.CompletedTask;

public Task WaitForStartAsync(CancellationToken cancellationToken)
{
if (!_options.SuppressStatusMessages)
{
_applicationStartedRegistration = _applicationLifetime.ApplicationStarted.Register(
static state =>
{
((DelayedShutdownConsoleHostLifetime)state!).OnApplicationStarted();
},
this);
_applicationStoppingRegistration = _applicationLifetime.ApplicationStopping.Register(
static state =>
{
((DelayedShutdownConsoleHostLifetime)state!).OnApplicationStopping();
},
this);
}

_stopSignalRegistrations = new IDisposable[]
{
PosixSignalRegistration.Create(PosixSignal.SIGINT, HandleSignal),
PosixSignalRegistration.Create(PosixSignal.SIGTERM, HandleSignal),
PosixSignalRegistration.Create(PosixSignal.SIGQUIT, HandleSignal),
};
return Task.CompletedTask;
}

public void Dispose()
{
_applicationStartedRegistration.Dispose();
_applicationStoppingRegistration.Dispose();

foreach (var disposable in _stopSignalRegistrations ?? Array.Empty<IDisposable>())
{
disposable.Dispose();
}

_shutdownDelayTimer?.Dispose();
}

private void HandleSignal(PosixSignalContext ctx)
{
Log.ShutdownSignalReceived(_logger, ctx.Signal, _hostOptions.ShutdownTimeout);

ctx.Cancel = true;
_shutdownDelayTimer = new Timer(
static state =>
{
DelayedShutdownConsoleHostLifetime lifetime = (DelayedShutdownConsoleHostLifetime)state!;
lifetime._applicationLifetime.StopApplication();
},
this,
_hostOptions.ShutdownTimeout,
Timeout.InfiniteTimeSpan);
}

private void OnApplicationStarted()
{
Log.ApplicationStarted(_logger);
Log.ApplicationStartedHostingEnvironment(_logger, _environment.EnvironmentName);
Log.ApplicationStartedContentRoot(_logger, _environment.ContentRootPath);
}

private void OnApplicationStopping()
{
Log.ApplicationShuttingDown(_logger);
}

internal static partial class Log
{
[LoggerMessage(1, LogLevel.Information, "Application started. Press Ctrl+C to shut down.", EventName = "ApplicationStarted")]
public static partial void ApplicationStarted(ILogger logger);

[LoggerMessage(2, LogLevel.Information, "Hosting environment: {envName}", EventName = "ApplicationStartedHostingEnvironment")]
public static partial void ApplicationStartedHostingEnvironment(ILogger logger, string envName);

[LoggerMessage(3, LogLevel.Information, "Content root path: {contentRoot}", EventName = "ApplicationStartedContentRoot")]
public static partial void ApplicationStartedContentRoot(ILogger logger, string contentRoot);

[LoggerMessage(4, LogLevel.Information, "Shutdown signal received: {signal} handling new requests for {delayDuration}", EventName = "ShutdownSignalReceived")]
public static partial void ShutdownSignalReceived(ILogger logger, PosixSignal signal, TimeSpan delayDuration);

[LoggerMessage(5, LogLevel.Information, "Application is shutting down...", EventName = "ApplicationShuttingDown")]
public static partial void ApplicationShuttingDown(ILogger logger);
}
}

A note on this implementation; we’ve reused HostOptions.ShutdownTimeout configuration option for our delay. This can have a different default value based on your ASP.NET Core Version and whether your application service is hosted using the older custom WebHost or using the IHostedService derived GenericWebHostService. There is a handy breakdown of which version and hosting method will have which default ShutdownTimeout in the table below:

Default Shutdown Timeout (seconds) for .NET Version vs Hosting method

🔊 The Call to Action

Because what blog post is complete without being asked to do something by someone you don’t know?!

If this issue has caused you pain like it has us, you may be interested to know that there is an open issue on the dotnet/aspnetcore repository related to this, here:
Make aspnetcore shutdown work by default on Kubernetes · Issue #30387 · dotnet/aspnetcore (github.com)

This has missed the boat for .NET 8, but with some 👍 and some polite comments, hopefully we could see something “Out of the Box” for this in .NET 9. 🤞

--

--

Hugo Woodiwiss
Just Eat Takeaway-tech
0 Followers

Software Engineer @ Just Eat Takeaway.com | Interested in .NET, WASI, Rust and WebAssembly 🦀