Spark’s Treatment of Empty Strings and Blank Values in CSV Files

Matthew Powers
Jan 19, 2017 · 2 min read

CSV Spec

name,color,is_pretty
rose,red,true
sunflower,,true
lilac,"",true

Spark 2.0.0

val homePath = sys.env.get("HOME").getOrElse(None)val flowersPath = s"$homePath/Desktop/flowers.csv"val flowersDf = spark.read
.format("csv")
.option("header", "true")
.option("charset", "UTF8")
.load(flowersPath)
flowersDf.show()
+---------+-----+---------+
| name|color|is_pretty|
+---------+-----+---------+
| rose| red| true|
|sunflower| | true|
| lilac| null| true|
+---------+-----+---------+

Spark 2.0.1

+---------+-----+---------+
| name|color|is_pretty|
+---------+-----+---------+
| rose| red| true|
|sunflower| null| true|
| lilac| null| true|
+---------+-----+---------+

Semantic Versioning

Onwards

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade