Spark Streaming part 3: tools and tests for Spark applications

Adaltas
Adaltas
Published in
14 min readJun 19, 2019

--

Whenever services are unavailable, businesses experience large financial losses. Spark Streaming applications can break, like any other software application. A streaming application operates on data from the real world, hence the uncertainty is intrinsic to the application’s input. Testing is essential to discover as many software defects and as much flawed logic as possible before the application crashes in production.

This is the third part of the four-part articles series. After developing a stream processing with Spark Structured Streaming in the first part, the processing was deployed on a Hadoop cluster in the second part. The third part covers unit testing of Spark applications. In the fourth and final part of the series, a Machine Learning algorithm will be incorporated in the data pipeline.

In this article, unit tests shall be incorporated to reduce the risk of our Spark application’s malfunctioning and failure. Tests automation within the application building process should be considered a necessary practice to avoid software bugs and mishandled edge cases. With automated testing, Spark application is automatically verified against the test suites each time the application is compiled.

The Python source code developed previously will now be rewritten into Scala code. Scala is a…

--

--

Adaltas
Adaltas

Open Source consulting - Big Data, Data Science, Node.js