PySpark Convert String To Date Format

BigData-ETL
2 min readSep 1, 2022

--

https://bigdata-etl.com/pyspark-convert-string-to-date-format/
https://bigdata-etl.com/pyspark-convert-string-to-date-format/

In this post I will show you how to using PySpark Convert String To Date Format. Since Spark 2.2+ is very easy. You can just use the built-in function like to_date or to_timestamp, because the both support the format argument.

Table Of Contents

PySpark Convert String To Date Format

Very often when we work with Spark we need to convert data from one type to another. The case with conversion from String to Date or from String to Timestamp is (I think) the hard one, due to fact that Date or Timestamp can be presented by various formats like: YYYYMM, YYYY-MM-DD, yyyy-MM-dd HH:mm:ss etc…

How To Use to_timestamp Function?

PySpark Convert String to Date or to Timestamp — please find the function to_timestamp which you can use to convert String to Timestamp in PySpark.

pyspark.sql.functions.to_timestamp(col, format=None)

Converts Column of pyspark.sql.types.StringType or pyspark.sql.types.TimestampType into pyspark.sql.types.DateType using the optionally specified format. Default format is ‘yyyy-MM-dd HH:mm:ss’. Specify formats according to SimpleDateFormats.

https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.functions.to_timestamp

Example Of: to_timestamp Method

df = spark.createDataFrame([('2022-02-14 16:15:00',)], ['t'])
df.select(to_timestamp(df.t).alias('dt')).collect()

[Row(dt=datetime.datetime(2022, 2, 14, 16, 15))]

How To Use to_utc_timestamp Function?

Given a timestamp, which corresponds to a certain time of day in the given timezone, returns another timestamp that corresponds to the same time of day in UTC.

https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.functions.to_utc_timestamp

Example Of: to_utc_timestamp Method

df = spark.createDataFrame([('2022-02-14 16:15:00',)], ['t'])
df.select(to_utc_timestamp(df.t, "PST").alias('t')).collect()

[Row(dt=datetime.datetime(2022, 2, 14, 16, 15))]

Check My Blog!

Check other posts on my blog!

--

--