Are you finding it hard to synchronize processes that read and write data on a data lake?
Are your solutions overly complex?
If you answered yes to these questions, there is a simple pattern we use at dataxu that can help. It gets the work done with minimal overhead.
The Data Science Engineering team at dataxu spends most of its’ day designing, developing, and maintaining a wide range of data pipelines that support our AI-based bidding system.
A pipeline is a set of stages (or sub-processes) that process input data producing an output, in which the output of each stage serves as the input for the subsequent stage; thus creating a chain of stages that modify a given input to produce a desired output. In particular, a data pipeline can be defined as the process or pipeline of processes that move, join, format, and process data in a system. And a good data pipeline should do so in an automated and reliable way, with minimum — or, ideally, no — human intervention. …
At dataxu, we have been using Scala for a while now, and in our journey we have perfected our best practices. Among these best practices, we have refined our implementation of the Singleton design pattern in Scala. In implementing this pattern there are some aspects that require significant attention to detail to increase its ease of use, especially when testing.
The Singleton design pattern, according to Wikipedia, is a software design pattern that restricts the instantiation of a class to one object. This means that this pattern promises to be handy when dealing with a class that we want to instantiate once, and only once, across a system (or, more generally, a subsystem), and have that same instance shared by every potential user of the class. …