Leetcode code execution system design (With working code).

Yash Budukh
4 min readMay 24, 2020

--

Click for code

Every person who is looking for a software engineering role has used some kind of interview preparation website like geeksforgeeks, leetcode , etc. Ever wondered how these run user code ? These websites allow users to run code on their servers , and instantly give the results. These websites have a dedicated service which handles this code execution . I tried building this code execution service and I am sharing my idea as I could not find any single concrete resource which talks about it. Of course there are many resources but they are scattered. This is one way which works fairly , I know 3 more ways which don’t work 😅 .Suggestions and edits are welcomed.Lets jump in.

Business requirements

  • Low latency.
  • Resilient to malicious programs.
  • High availability.
  • Asynchronous.

High level design

Why are the workers shaped like a container 🤔 ??

Client-Server Behind the scenes

  • The client makes a post request which contains the source code , choice of programming language , input (if required) and the time limit. The server responds back with a unique job id.
  • The server then pushes this request in a task queue which will be processed by one of the many worker nodes.
  • The worker node will run the program in a secure environment so that the program wont cause any harm to the system.Once the execution is completed the worker node updates the status in the cache-box (with a certain expiry) and sends an acknowledgement to the task queue .
  • In case of a positive acknowledgement the task queue deletes that task else in case of a negative acknowledgement the task is not deleted.
  • Client can do a get request to with appropriate job id to get the results or a appropriate status (Added to queue,running)
courtesy — https://docs.sphere-engine.com/compilers/api/overview-version-4

These websites allow users to run code on their servers , this can be very dangerous as users can run malicious code which can take down the entire system. Lets see the various scenarios in which a user can cause damage to the system.

  • :(){ :|: & };: “ This will make sense hang on ”
  • An infinite loop.
  • Recursion without a base condition.
  • Running a CPU and memory intensive job.
  • Creating files,deleting files,killing running processes etc.

Lets tackle these problems

If not handled correctly the never ending programs like an infinite loop , recursion without a base condition or a goto statement can leave other processes starved.To tackle this problem we need to make sure that a program runs for a fixed number of time and if it exceeds the limit that process should be killed this can be achieved using a Linux command known as timeout .

If a user is running a CPU or a memory intensive job this can starve other processes and this can be unfair to other users as their programs will be executed slowly.Hence it is very important to make sure that every process gets the adequate amount of CPU time and memory.

To stop the users from altering the file system we can use a chroot jail or selinux.

:(){ :|: & };: ” This little statement can freeze your system in a second . This is the fork bomb .

Fork Bomb is a denial-of-service attack wherein a process continually replicates itself to deplete available system resources, slowing down or crashing the system due to resource starvation.

Seems pretty scary right ?? But handling this is quite easy as we can limit the number of processes a user can run. When a user runs a fork bomb he/she will get a message similar to this “BlockingIOError Resource temporarily unavailable

Its okay if you don’t get the meme.

Containerization

Running the worker node inside a container with limited memory and CPU will solve all the problems and also adds an extra layer of security and also spinning up a container is very fast .If any malicious code was to attempt to destroy the system, its effects would remain inside the container it is working in and we can restart the container very quickly. I feel containerizing the worker node also helps us in scaling .

Scaling it

This design is not obviously production ready . We need multiple instances of servers behind a load balancer . Similarly we need a better infrastructure for queue as well as caching . For data persistence we can add a database.

Use cases

  • Hiring Platforms.
  • Coding competitions.
  • Aid from programming classes.

Implementation

I used rabbitmq for the task queue, redis for caching the results with and express and nodejs for the server .Source code can be found here.

Improvement

  • Can enhance security using tools like apparmor , selinux.
  • Improve the design to make the system partition tolerant.

References

--

--