How redBus moved its desktop web to DOT NET CORE — Insights

redBus — a brief introduction. We are the world’s largest bus booking platform and have recently completed 100 million bus rides in India. We have customers accessing our platform via Desktop, m-web (PWA) and Apps.

Background

We do get a substantial traffic on our desktop (www.redbus.in) especially on weekdays with high conversion rates. Our desktop web platform caters to different geographies (India, Singapore, Malaysia, Peru, Colombia and Indonesia as of now). We are hosted on AWS.

Network Diagram

Technology Stack at redBus

We have various applications in redBus that run on different technology stacks right from node.js, GoLang, Java, Scala. .NET and recently Erlang. Traditionally www.redbus.in was built on .NET framework <> MVC 4 running on a typical windows stack with IIS as its web server. We got a lot of mileage with this and the team adopted C#quite beautifully. We feel C# is one of the best languages that is out there for some solid web (MVC framework) development. As the other teams progressed, evolved and adopted open source more and more, the web team felt the need for the same. We did not want to lose the power and the knowledge the team had on C# as a language. .NET CORE made it easy for us to jump on to the open source wagon while retaining the expertise on C#. First we moved our International markets (outside India) on to .NET Core and we saw good reduction in latency during peak time. Encouraged by this, we went ahead with India roll-out which accounts for the majority of our current Desktop traffic.

So, what is ASP .NET CORE ?

Details here. In a nutshell it is a cross platform development framework. This link beautifully captures when to use .NET CORE vs. .NET Framework. We follow micro service architecture at redBus and many of our services are auto-scaled depending on the traffic, time of the day etc .. Each micro service talks to a back-end store (NoSQL, SQL or some cache). In this context, .NET CORE really thrives and gives maximum output.

Why ?

There were few compelling reasons as listed below

  • EC2 (on AWS) — we use anywhere between 2–6 EC2s of C4 family depending on the traffic. [$0.192/hr for Windows OS]. By moving in to Linux OS, it automatically brought down the EC2 cost to [$0.1/hr for Linux]. This amounts to 45% of cost ! Power to open source !
  • During our peak times, we were constantly hitting the roof (performance — number of requests we can handle).This was obviously transforming in to cost as we used to scale horizontally. We did some fine tuning etc .. but the nature of the application was such a way that it had to do many IOPS (call other Micro Services) before rendering to the browser. Based on our research — raygun and others, we saw the promise .Net Core had to offer.

Some challenges along the way

There were few but nothing critical. It was more to do with how quickly the developers can debug and analyse the performance metrics.

For our performance testing — we used our Search Result Page. We usually get 20–30 such requests/second on our Desktop ELB outside the other requests. If we observe our ELB logs, it would be around 3-4K/minute.

We were running Kestrel 1.1.0 web server and while doing the load testing, we ran in to issues for moderate load [10 req/s]. Our servers were getting out of service as soon as they were exposed to high traffic and finally those were taken out of ELB as server was unable to serve those requests. We observed that Memory and CPU were absolutely fine on servers but there were a lot of connections in CLOSE_WAIT state. We were running off a default Kestrel configuration. All we had to do was fine tune some knobs to hit the road.

We fine tuned the following runtime configurations. In Kestrel 2.0, these are by default true [System.GC.Server and System.GC.Current].

While porting the code from .NET framework | MVC to .NET Core — Things we learned

  • The support for different DLLs is not completely available on Nuget, due to which we needed to look for alternatives for doing the same thing in .NET Core.
  • Handling Response Caching, Response Compression, Error handling are through middlewares.
  • Memory caching and configuration handling is different.

Monitoring

  • When we were on .NET Framework, we were using New Relic. We use the service map feature quite well to navigate quickly to any issues / anomaly. For .NET CORE, New Relic does not have any monitoring. We used pm2 as our service process manager and linked it with keymetrics to monitor. We are also doing some work to push other metrics to Kibana and link them to New Relic for our other monitoring workflows.
  • We looked at Dynatrace as well as an alternate.

Results and Impact

  • Average Latency improved from 350ms to less than 250ms.
  • CPU during peak time was [60–40%]. This dropped to [15–25%].
  • AWS Cost Reduction by replacing Windows with Linux servers [by 45%]

Peak in to Performance Testing

We tested our response times using apache benchmarking, JMeter and Blaze meter and found significant performance improvement.

a) Apache benchmarking [89 vs. 57 req/s]

.NET Core — 89.68 requests / second.

.NET Framework on Windows — 57.21 requests/second

b) JMeter [Throughput — 142.9 vs. 60.5 per second]

.NET Core Solution: Hit 1000 requests on http://XXXX.redbus.pe/ , Throughput was 142.9/Sec. PFB the screenshot of same

.NET Framework on Windows: Hit 1000 requests on http://XXXX.redbus.pe/, Throughput was 60.5/Sec. PFB the screenshot of same

c) Blazemeter [90% response time — 417ms vs. 1.09s]

.NET Core Solution: 90% Response time was 417ms

.NET Framework on Windows: 90% Response time was 1.09s