Spark 3.0 is out, and there are ton of improvements! But there are a nice improvement that is not yet highlighted in the announcement post: Push down filter for CSV file.

Prior to Spark 3.0, when you load a CSV file, the CSV file is read to memory then apply filter, which is a waste of CPU cycle and bandwidth. Now, the data can be filtered as the files are read. This is similar to push down filter in Parquet but now for CSV files.

Here is a quick example: I load a CSV file (flights dataset from Kaggle), then filter by ORIGIN_AIRPORT, then print out the execution plan. …

Docker allows you to set the limit of memory, CPU, and also recently GPU on your container. It is not as simple as it sounds.

Let’s go ahead and try this

docker run --rm --memory 50mb busybox free -m

The above command creates a container with 50mb of memory and runs free to report the available memory. If you run this on Mac, you should see a similar output like the below screenshot.

Why doesn’t it show 50mb as in the memory parameter? Why it shows 2gb of memory, and where is the 2gb come from?

This is the first catch of container memory limitation. The parameter limits the container memory usage, and Docker will kill the container if the container tries to use more than the limited memory. But inside the container, you still see the whole system available memory. reports the available memory, not the allowed memory. It is the same for os.totalmem (nodejs) or psutil.virtual_memory (python). …

This is my collection of notes and opinions on Software Architecture. This helps to guide me through software architecture and design. I publish this to hope this will be helpful for others, and also to receive feedback as well 🙂

Architecture is about identifying the necessary components to support the business requirements, their characteristic, role, and how they interact with each other.

Software design is the realization of the architecture. There may be multiple designs that support the architecture. One can consider the architecture is the most abstract design of the system.

Architecture is about things that are not likely to change throughout the lifecycle of the system. It’s like when you build a house, the architecture tells you how many stories, where are the doors, where are the rooms. These elements are fixed, at least for a very very long time. The furniture may be changed, the paint may be changed, the people in the room may be changed, but it’s not likely that you will change the location of the door. …


