AWS Glue Job Fails with CSV data source does not support map data type error

Anand Prakash
Analytics Vidhya
Published in
3 min readMar 28, 2021

--

Image by GraphicMama-team from Pixabay

AWS Glue is a serverless ETL service to process large amount of datasets from various sources for analytics and data processing. Recently I came across “CSV data source does not support map data type” error for a newly created glue job. In a nutshell, the job was performing below steps:

  1. Read the data from S3 using create_dynamic_frame_from_options
  2. Perform some required transformations
  3. Write the transformed data to Amazon Redshift using write_dynamic_frame_from_jdbc_conf

And it was during this write step that the glue job was failing. Lets look into it in little more details -

datasource0 = glueContext.create_dynamic_frame_from_options(           connection_type="s3", 
connection_options = {
"paths": [S3_location]
},
format="parquet"
)

2. The schema for the data was as below:

datasource0.printSchema() root 
|-- id: string
|-- version: int
|-- description: string
|-- type: string
|-- status: string
|-- rel_metadata: map
| |-- keyType: string
| |-- valueType: string
|-- mod_metadata: map
| |-- keyType: string
| |-- valueType: string
|-- event_type: string
|-- created_at: long
|-- last_updated: long
|

--

--

Anand Prakash
Analytics Vidhya

Avid learner of technology solutions around Machine Learning, Big-Data, Databases. 5x AWS Certified | 5x Oracle Certified.