Software Developer @Zomato #fullstackdev #php #golang #iOS I love distributed systems, devops & cloud. My hobbies are photography & sketching
… move the file from s3 into HDFS and unzip it(If the big file you are referring is in s3). If it is already in HDFS, you could unzip it before you load into spark.
Since we process data in multiple places, we need to make sure that our systems always are aware of the latest schema, thus we rely on the Hive Metastore to be our ground truth for our data and its s…