Handling Exceptions In Apache Spark

Mohamed Camara
2 min readJun 5, 2020

--

Sometimes when running a program you may not necessarily know what errors could occur. In such a situation, you may find yourself wanting to catch all possible exceptions. Your end goal may be to save these error messages to a log file for debugging and to send out email notifications. We will see one way how this could possibly be implemented using Spark.

Source:
https://datafloq.com/read/understand-the-fundamentals-of-delta-lake-concept/7610

Scala offers different classes for functional error handling. These classes include but are not limited to Try/Success/Failure, Option/Some/None, Either/Left/Right. Depending on what you are trying to achieve you may want to choose a trio class based on the unique expected outcome of your code.

For example, instances of Option result in an instance of either scala.Some or None and can be used when dealing with the potential of null values or non-existence of values. In other words, a possible scenario would be that with Option[A], some value A is returned, Some[A], or None meaning no value at all. scala.Option eliminates the need to check whether a value exists and examples of useful methods for this class would be contains, map or flatmap methods.

Instances of Try, on the other hand, result either in scala.util.Success or scala.util.Failure and could be used in scenarios where the outcome is either an exception or a zero exit status.

We will be using the {Try,Success,Failure} trio for our exception handling.

Source:
https://datafloq.com/read/understand-the-fundamentals-of-delta-lake-concept/7610

Only non-fatal exceptions are caught with this combinator. Example of error messages that are not matched are VirtualMachineError (for example, OutOfMemoryError and StackOverflowError, subclasses of VirtualMachineError), ThreadDeath, LinkageError, InterruptedException, ControlThrowable. The Throwable type in Scala is java.lang.Throwable.

For the purpose of this example, we are going to try to create a dataframe as many things could arise as issues when creating a dataframe.

Alternatively, you may explore the possibilities of using NonFatal in which case StackOverflowError is matched and ControlThrowable is not. NonFatal catches all harmless Throwables.

--

--

Mohamed Camara

Big Data Fanatic. Interested in everything Data Engineering and Programming.