Feb 23, 2017 · 1 min read
Nice post, Matt. In the case of using a UDF, I think your solution makes sense and handles it elegantly. While reading your post, I was waiting until you mentioned fillna, but you didn’t. I generally like to keep my UDFs context simple and not handle cases as they relate to Spark or DataFrames, and to let Spark’s functions handle those cases. Since we do a lot of numerical computation, we actually have a “last step” that handles each column’s null values appropriately with a map of columns to imputed values.
