Solving 5 Mysterious Spark Errors

yhoztak
yhoztak
Sep 7, 2018 · 6 min read
https://raw.githubusercontent.com/jupyter-incubator/sparkmagic/master/screenshots/diagram.png
from: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals

Mysterious (..or weird) Spark Errors:

Problems & Solutions


Problem 1: Resolved attribute(s) your_field#xx missing from …..

Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved attribute(s) surname#20 missing from id#0,birthDate#3,name#10,surname#7 in operator !Project [id#0,birthDate#3,name#10,surname#20,UDF(birthDate#3) AS birthDate_cleaned#8];
left = left.select(specific_coluns_from_left)
left.cache()
right = right.select(specific_coluns_from_left)
right.cache()
left.join(right, ['column_to_join'])
left_cloned = left.toDF(columns_in_order_renamed_to_avoid_confusion)
left_cloned.join(right, ['column_to_join'])
## or in sql
left_cloned.registerTable("left_cloned")
sqlContext.sql("""
SELECT right.* FROM
left_cloned
JOIN right
on right.col1 =left_cloned.col2...
""")
Reference ‘name’ is ambiguous, could be: name#8484, name#8487.
left.join(right, ['column1', 'column2'])

Problem 2: An error occurred while calling o64.cacheTable.

An error occurred while calling o206.showString.: org.apache.spark.SparkException: Job 25 cancelled because SparkContext was shut down
...
...
df1.groupby('col1').agg(F.countDistinct('col2').alias('uniq_col2_count')).orderBy(F.desc('uniq_col2_count')).show()

Problem 3. After successfully importing it, “your_module not found” when you have udf module like this that you import

from lib.preprocess import find_keywords
df.show()
sc.addPyFile(path_to_your_module.py)
import sys
sys.path.append('/tmp/modules/lib')

4.“Cannot have map type columns in DataFrame which calls set operations”

{‘US’: 3, ‘EU’: 0, ‘UK’: 0}
[{“country”:”US”, “count”:3}, {“country”:”EU”, “count”:0}, {“country”:”UK”, “count”:0}]
Cannot have map type columns in DataFrame which calls set operations

5. IOPub data rate exceeded

--NotebookApp.iopub_data_rate_limit=10000000
>ls -lh ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2-rw-r--r--@ 1 me  staff   64G Sep  7 16:00 /Users/me/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2
docker rm $(docker ps -q -f 'status=exited')
docker rmi $(docker images -q -f "dangling=true")

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade