What I Learned From Attending #SparkAISummit 2020
One of the best virtual conferences that I attended over the summer is Spark + AI Summit 2020, which delivers a one-stop-shop for developers, data scientists, and tech executives seeking to apply the best data and AI tools to build innovative products. I learned a ton of practical knowledge: new developments in Apache Spark, Delta Lake, and MLflow; best practices to manage the ML lifecycle, tips for building reliable data pipelines at scale; latest advancements in popular frameworks; and real-world use cases for AI.
I want to share useful content from the talks that I enjoyed the most in this massive blog post. The post consists of 6 parts:
- Use Cases
- Data Engineering
- Feature Store
- Model Deployment and Monitoring
- Deep Learning Research
- Distributed Systems
1 — Use Cases
1.1 — Data Quality for Netflix Personalization
Personalization is a crucial pillar of Netflix as it enables each member to experience the vast collection of content tailored to their interests. All the data fed to the machine learning models that power Netflix’s personalization system are stored in a historical fact store. This…