See more
I downloaded the data files from TLC website, and (very painfully) using Python, Dask, and Spark, have produced a cleaned dataset in Parquet format, which I make this available for AWS users at the end of this post.