MDS Newsletter #112

Aayush Jain
Modern Data Stack
4 min readNov 22, 2023

--

Dive into the intricate world of Apache Airflow with valuable tips to avoid common pitfalls. In this edition, explore three exciting data conferences by Rockset, Databricks, and Apache Flink, offering opportunities to learn, network, and uncover the potential of these technologies. Discover a company’s innovative approach to supporting entrepreneurs with financial aid. And that’s just the beginning — there’s a lot more to explore, so let’s dive in!

Featured tools of the week

  • Cdata: The Universal Data Connectivity Platform offered by CData is a connectivity solution that simplifies data connectivity and enables seamless real-time integration of data across an organization’s entire tech stack. With broad access to more than 250 enterprise data sources and counting, CData’s data connectivity solutions eliminate data silos and break down barriers to better integration and insights, enabling self-service data analytics and integration.
  • GoodData Cloud is an analytics platform focused on the semantic layer, reusability of metrics, business user self-service, and a multi-tenant environment. Their approach relies on a semantic layer consisting of a logical data model and metrics written in our analytical query language.

    GoodData has raised a total of $167.7M in funding over 13 rounds. Their latest funding was raised on Jul 27 2021, from a Debt Financing round.

Featured stack of the week

  • Health Joy: HealthJoy is a mobile application that maximizes the value of employers’ benefits packages, reclaims HR’s time so they can focus on strategy over administrative tasks, and helps employees achieve better healthcare outcomes.

Here are the data tools of Health Joy:

Good reads and resources

  • Streamlining Membership Data Engineering at Netflix with Psyberg: Netflix’s Membership and Finance Data Engineering team introduces Psyberg, an incremental data processing framework addressing challenges with late-arriving data. Psyberg automates data loads, utilizing Iceberg metadata to enhance accuracy and efficiency. It detects and manages late-arriving data without manual intervention, streamlining Netflix’s data processing pipelines. The framework’s metadata-driven approach ensures higher data accuracy, making pipelines more efficient and timely. The article outlines challenges faced before Psyberg’s implementation and highlights its transformative impact on Netflix’s data processing workflows. Further details on Psyberg’s modes are expected in the upcoming blog posts.
  • Documenting a Complex Database: Challenges and Triumphs: In documenting a scattered database for a company with 80 employees, Igor Comune faced challenges due to non-standardized data and the absence of a database diagram. Using tools like Power BI, Google Big Query, and Diagrams.net, he prioritized Fact and Dimension Tables, adopting a Star Schema approach. Despite initial difficulties, Igor successfully documented over 400 tables in two months, later extending the process to finance tables, completing the project in three months. The resulting 50-page Google Docs document captures 95% of the database, regularly updated to reflect changes, marking a significant milestone in Igor’s professional career.

Upcoming data events, summits and webinars

  • Analytics Wonderland: Virtual Holiday Event: Join this event by Sigma for a jolly program that will take you on a festive journey through the Analytics Wonderland Workshop!

    📅 Tuesday, December 5th
    🕥 10:00am-11:00am PST
    📍 Virtual

MDS Jobs

  • Komoot is hiring (Senior) Data Analyst
    Location: Remote
    Stack: Amazon Redshift, dbt, Airflow, Python
    Apply here
  • Education Analytics is hiring Analytics Engineer/Data Engineer
    Location: USA
    Stack: dbt, Snowflake, Airflow, AWS
    Apply here
  • Valon is hiring Senior Data Engineer
    Location: New York, US
    Stack: GCP, Fivetran, Segment, dbt, Bigquery, Looker
    Apply here

🔥 Trending on Twitter

Just for fun 😀

Do you have an insatiable appetite for the latest developments in the dynamic realm of data? You’re in luck! Simply click the “Follow” button on our LinkedIn and Twitter profiles and indulge in the freshest and most cutting-edge content on data.

That’s not all! We value your feedback and encourage you to rate us here. Don’t be hesitant to share your thoughts on how we’re delivering the data goods!

Love it | It’s great | Good | Okay-ish | Meh

We welcome your ideas, recommended articles, and job listings pertaining to data engineering. Your input is valuable to us and will be incorporated into our next edition. Please don’t hesitate to reach out and share with us.

About Moderndatastack.xyz, we are building a platform to connect members of the data community and educate them on modern data infrastructure and management. We’re proud of our platform and invite you to explore it!

--

--