Karan Talati
Feb 23, 2017 · 1 min read

Nice post, Matt. In the case of using a UDF, I think your solution makes sense and handles it elegantly. While reading your post, I was waiting until you mentioned fillna, but you didn’t. I generally like to keep my UDFs context simple and not handle cases as they relate to Spark or DataFrames, and to let Spark’s functions handle those cases. Since we do a lot of numerical computation, we actually have a “last step” that handles each column’s null values appropriately with a map of columns to imputed values.

    Karan Talati

    Written by

    Building the future of manufacturing at @firstresonance. Former software engineer at @SpaceX engineer and @thesense360.

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight.
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox.
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month.