Projects such as log data mining or machine learning handle large amounts of data. Developing the scripts or libraries for handling large amounts of data is tedious and time-consuming, since running such scripts takes minutes or hours. This aspect of the data handling projects prevents us from applying good practice such as refactoring or unit tests. In the end, we are not able to keep such a project clean.
This article shows how we keep data handling projects clean using Hideout caching.
Common data handling projects have more than one step (or method call) that takes a long time. Some…