Chaining Custom DataFrame Transformations in Spark

Matthew Powers
Jan 28, 2017 · 3 min read

Dataset Transform Method

def withGreeting(df: DataFrame): DataFrame = {
df.withColumn("greeting", lit("hello world"))
}

def withFarewell(df: DataFrame): DataFrame = {
df.withColumn("farewell", lit("goodbye"))
}
val df = Seq(
"funny",
"person"
).toDF("something")

val weirdDf = df
.transform(withGreeting)
.transform(withFarewell)
weirdDf.show()+---------+-----------+--------+
|something| greeting|farewell|
+---------+-----------+--------+
| funny|hello world| goodbye|
| person|hello world| goodbye|
+---------+-----------+--------+
df
.select("something")
.transform(withGreeting)
.transform(withFarewell)
withFarewell(withGreeting(df))// even worsewithFarewell(withGreeting(df)).select("something")

Transform Method with Arguments

def withGreeting(df: DataFrame): DataFrame = {
df.withColumn("greeting", lit("hello world"))
}

def withCat(name: String)(df: DataFrame): DataFrame = {
df.withColumn("cats", lit(s"$name meow"))
}
val df = Seq(
"funny",
"person"
).toDF("something")

val niceDf = df
.transform(withGreeting)
.transform(withCat("puffy"))
niceDf.show()+---------+-----------+----------+
|something| greeting| cats|
+---------+-----------+----------+
| funny|hello world|puffy meow|
| person|hello world|puffy meow|
+---------+-----------+----------+

Monkey Patching with Implicit Classes

object BadImplicit {

implicit class DataFrameTransforms(df: DataFrame) {

def withGreeting(): DataFrame = {
df.withColumn("greeting", lit("hello world"))
}

def withFarewell(): DataFrame = {
df.withColumn("farewell", lit("goodbye"))
}

}

}
import BadImplicit._val df = Seq(
"funny",
"person"
).toDF("something")
val hiDf = df.withGreeting().withFarewell()

Avoiding Implicit Classes

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade