Big salary? Take a look at PySpark basic questions!
With each passing day, Python is gaining traction and is the most sought programming language due to the fact of its easy-to-understand, clean structure, and ever-increasing community support. With its momentum, Spark has also integrated its API with Python which is commonly known as PySpark.
Spark has a massive capability of processing huge data parallelly. Because of PySpark, parallel processing has now become easier for the python community. Looking at the surging demand for PySpark developers, below are a few questions that should come in handy in the first go :
- Spark Architecture? Cluster types, modes, and spot instances? Mounting storage? Job vs Stage vs Task?
- Actions vs Transformations? Directed Acyclic Graphs? Lazy Evaluation?
- RDD vs Dataframe vs Dataset? Parquet file vs Avro file?
- StructType vs StructField? Delta lake? Time travel?
- Syntax errors vs Exceptions?
- startsWith() vs endsWith()? withColumn vs select vs withColumnRenamed? Map vs FlatMap? Why do we use ‘literals’?
- .collect() ? show vs display? How to display the full values of a column?
- Create RDD from a list? Create RDD from a text file? Current_date vs current_timestamp ?
- Reading and writing a file? Create an empty dataframe?
- Convert dataframe to rdd and rdd to dataframe?
- Broadcast variable, explode, coalesce, and repartition?
- Merge or union two dataframes with a different number of columns?
- Iterate through each row of a dataframe in PySpark?
- How to handle NULL values?
I hope the above-listed questionnaires will bring some help with your recruitment! Happy Learning!!
Cheers!
rohitveryani@gmail.com