Become a member
Sign in
Chao-Fu Yang
Chao-Fu Yang

Chao-Fu Yang

22 Following
3 Followers
  • Profile

  • Claps

Latest

Chao-Fu Yang
Chao-Fu Yang
Jun 9, 2018 · 1 min read

Force caching Spark DataFrames

Caching of DataFrame (df.cache() or df.persist(LEVEL)) in Spark is lazy, which means a DataFrame will not be cached until you trigger an action on it. Besides, shuffled DataFrames are automatically cached and may cause out-of-memory error if you don’t notice the factor.

23