FastSpark: A New Fast Native Implementation of Spark from Scratch

TLDR: Here is the code to explore.

It all started during my hobby research on various distributed schedulers and distributed computing frameworks. Naturally, Spark came under the bracket. I was already somewhat familiar with Spark internals since I have been using it for over 3 years. It struck me then that one of the primary reasons why it became hugely successful is not just because of its speed and efficiency, it is due to its very intuitive APIs. This is the same reason why Pandas are also extremely popular. If not…