How we brought Windows, IIS & .NET Apps to Cloud at Walmart

Rakesh Kumar
Walmart Global Tech Blog
7 min readSep 13, 2018

Building cloud native application comes with its own unique set of challenges and these challenges becomes bigger when you are building a cloud platform which is supposed to be catering need of traditional Windows & .NET applications which empowers various business needs of one of largest organization on planet.

Objectives: The goal of the team was to help Walmart Windows/.NET applications which empowers their business to move to cloud. Period.

Challenges: Thousands of Applications, built using Microsoft Tech stack which empower Walmart business are owned by many teams scattered across continents. These applications are mainly .NET/C# based Web applications or say Windows Services, having a messaging and a DB layer. They were deployed and running on different Windows farms clusters and were managed manually. There were so many teams which managed their own infrastructure also.

Any deployment and update to applications was manual with multi steps operation with limited access to applications owners, typically via many service request tickets.

The existing Walmart Cloud infrastructure didn’t have capabilities to provision Windows OS based virtual machines, IIS web server & thus did not have the ability to run .NET or Windows based application on cloud.

The good news was that cloud and platform infrastructure already had most of platform services addressing cloud native & cross cutting concern for Java/Linux based applications on cloud, only they were not accessible to Windows and .NET applications.

Our solution approach to build .NET/Windows capabilities had two main areas to focus:

First: As a .NET Cloud platform first make Windows OS ready computes & IIS web server available on Walmart Cloud infrastructure.

Second: Provide .NET middleware APIs for applications to consume & help them build cloud native applications by addressing application cross cutting concern on cloud.

First: Windows OS, IIS Web Server & .NET Runtime:

We made Windows server OS with IIS as web server images available in Walmart cloud by building OneOps (infrastructure that powers Walmart cloud) packs with ability to choose, configure & deploy different VM sizes and .NET frameworks & run time. We also gave user option to add additional application specific run time & dependencies, drivers using Chocolatey packages distributed via Nexus server.

Second: Addressing Cloud Native Applications Cross Cutting Concerns:

Our second goal was to abstract challenges & concern faced in cloud development by providing appropriate cloud native middleware APIs, Services to application owners so that building cloud native applications & adopting cloud becomes easier.

During this process we looked into many cloud services available internally and studied their pattern and practices. There is already cloud culture & focus in Walmart and experts building platform and services for cloud mainly java & open source based technologies. We adopt few practices (there was no need to reinvent wheel) from them. We further enhanced them to provide solutions unique to .NET & Windows environment. Frequently these choices were questioned from experts coming from non Windows background. What helped us in this process was the understanding of our Windows customers, applications & their use cases.

Remote Application Configuration:

Storing configuration with the application components results more overhead in case of configuration changes & management specifically in cloud when environment is dynamic. Configuration changes need to be propagated to application components.

Traditionally .NET Application store their configuration settings in local files in app.config or web.config. In cloud scenario this need to be managed or stored remotely. We enabled already existing Remote Cloud Configuration Management Service to .NET application by developing .NET middleware APIs. We also integrated it with local configuration file and ability for application to override it when application owners wish to do so or when remote configuration service no longer available. Any change in configuration from remote server will be notified via .NET events for applications to update it. Moving configuration from local to remote server is paradigm shift and when we took this capabilities to app owners they could not see a value out of it. However when we showed them examples with use cases, they adopted it.

Application Logs Management:

Application Logs are the stream of aggregated, time-ordered events collected from the output streams of all running processes. Each running process writes its event stream which routes events to stdout.

Local file based logging will not work in cloud & logs need to be managed remotely. In Walmart we already have Kafka based logging and monitoring platform. We wrote a new logging client with standard Debug, Info, Error messages, by developing Kafka transport layer for .NET. We realized that many teams are already using popular Log4Net API and its futile for us to tell them to move to new logging API & libraries. We made this process seamless by writing a Log4Net appender. We also provided full control over log management by using a Gray log cluster pack, which application can deploy and configure as per their need. We enhanced our logging API with writing a new C# Graylog transport. This enabled application to switch between their logging choices using remote configuration dynamically.

We also utilized Event tracing for Windows to offer more robust and powerful Out of Process logging solution for high volume log events.

Cloud Messaging infrastructure:

The application components of a distributed application are hosted on multiple sites and have to exchange information with each other, Messaging infrastructure supports asynchronous operations, enabling you to decouple a process that consumes a service from the process that implements the service.

In Walmart, applications use various messaging platforms as per their need and expertise e.g. IBM MQ, Active MQ, TIBCO etc. Providing a cloud ready unified API was not an easy task in .NET, In java you have standard JMS specification but nothing exists in .NET which provides a common API across multiple messaging provider. We looked into standards like AMQP but it didn’t offer full capabilities across popular messaging vendors. We wrote JMS style API for .NET which was customized for our need and suited for cloud scenarios such as sending messaging across cluster of brokers, load balancing, remote configuration etc.

Identity and Access Management:

IAM aims to improve the user experience of the application by delegating authentication to an external identity provider. It also minimizes the requirement for user administration.

In Walmart application usages various solution to meet their AuthN/AuthZ need. Traditionally identity is stored in various active directory deployments, We also have our own SSO & Identity services. Many applications are accessible internally as well as externally for Walmart customers. This was one of the area which was bottleneck for application to move to cloud as existing solutions used for AuthN/AuthZ were legacy and were not cloud ready.

We developed our OWIN middleware pluggable to ASP.NET pipeline as well ASP.NET Identity Provider abstracting SSO (Single SignOn) and identity services which provides asynchronous APIs for all AuthZ needs. We made sure identity services route and talk to Active directory for identity stored there.

Application Caching needs:

Caching aims to improve the performance and scalability of a system by temporarily copying frequently accessed data to fast storage location close to the application.

We already have distributed caching services deployed across Walmart data centers which was McRouter based MemCache solution. We enabled distributed cache to .NET applications by extending Core .NET API, Thus we avoided giving new Cache API to application owner. We wanted the transition from in memory cache to distributed cloud specific cache to become seamless for the application owners.

Application Audits & Performance Monitoring:

Telemetry solution collects and highlights operational events and reduces Application management costs, while at the same time giving useful insights into the application behavior in real time.

In Walmart we already have telemetry platform to monitor application performance metrics. However it was only consumed by Java applications. We wrote C# middle-ware APIs which integrates Windows & .Net/ASP.NET specific performance counters & allows applications to visualize and monitor their application performance on Grafana based dashboard in real time. It also allows application to send notifications based on predefined performance thresholds.

Cloud Adoption & Utilization Tracking: Tracking utilization of cloud platform as well as our APIs middle-ware was important to understand how cloud platform is adopted. Utilization Platform and middle-ware have ability to collect runtime or cloud context information and to send pings to our utilization service built using ELK with Kibana dashboard.

.NET Cloud Platform

Design, Quality & Continuous Integration: Before we developed any middleware or services for .NET we had our CI/CD pipeline in place. We made sure to write enough unit tests, system tests, examples applications for each APIs we offered to customers, beside doing multiple rounds of code review in each sprint. We made sure that our CI pipeline have these steps and every code change runs all these testes cases before publishing middleware to nexus server. Our special focus was on performance test cases and report generated for each middleware APIs. We took extra effort to analyze them in details & made sure the API’s didnt have any unwanted impact on system resources ( memory, threads, IO etc.) as well as on the consuming application’s performance. We made most of API, or call to any remote services asynchronous backed by appropriate cloud native patterns e.g. circuit breakers, Remote configuration, Service Registry etc.

We spoke to our customers(application owners, leads, architects etc.) as often as possible. We understood their use cases, pain points and expectations from us.

We provided application teams with enough API documentation, sample code and demo which helped them with redesigning their applications to become cloud ready.

--

--