(move)Data Takeaways

Matt Weingarten
3 min readDec 15, 2022

--

Love bytes (hair metal reference, anyone?)

Introduction

Last week, Airbyte hosted their two day (move)Data conference with plenty of interesting lightning sessions (and if you missed the fun, you can catch all the replays here). As someone who didn’t know all that much about Airbyte going in, this was also a nice learning experience. What were my main takeaways?

Duck, Duck, Goose

There’s been a lot of hype recently for DuckDB, so I was definitely looking forward to a talk giving some more background on what exactly it is. The performance as a result of the fantastic support for aggregations is pretty astounding. Can you imagine a query against 1.5 billion rows taking 3 seconds? As a team that produces approximately 1 billion rows of data a day, we could only dream of achieving that.

DuckDB’s support for plugging into normal programming languages is very useful for the modern state of data-driven development. The possibilities when this gets brought into the Cloud will be very interesting (that 3 second query above was on a local laptop!). I hope to see this become more of a reality soon enough.

Data Analysts Are Setup To Fail

Pretty catchy title, right? This talk focuses on how data analysts have improved technically within the last decade, whereas the technology they’re using has not. With the emergence of analytics engineering and the expectations that come along with it, data analysts have started to become more like data developers rather than business analysts.

I can’t speak for data analysts everywhere but I would assume this is a common limitation. At the end of the day, an analyst should be able to focus on data assets and those associated insights, not all the other nonsense that can come up in the development process. How we get to that self-serve and automated state remains to be seen, but that would certainly take a load off the analysts’ back and let them focus on the real value of data.

Ingestion And Observability

If you thought a data conference in 2022 was safe from the observability buzzword, think again (it is that relevant, trust me). While we’ve spoken in great detail about the importance of data observability, we’ve never really mentioned monitoring of the underlying ingestion process itself. On top of all of the table-level and column-level metrics that should be in place, pipeline-level metrics should exist as well.

I definitely agree with the assessment made in this session. While data observability is still a work in progress for us, we already do have the ingestion monitoring in place. On top of having SLAs for our pipelines, we also have alert notifications in place with Slack and PagerDuty. This has allowed us to be more cognizant when it comes to working with data consumers, as we can establish firm guidelines with them on our data expectations, and make sure we adhere to it.

Conclusion

I assume that’s probably it for conferences in 2022 (or at least I hope so because sleep is starting to disappear from my life). For those who like to check out more talks from (move)Data, I’d also recommend the sessions on modern data management and DataOps on the modern data stack. Looking forward to what’s coming ahead in 2023.

--

--

Matt Weingarten

Currently a Data Engineer at Samsara. Previously at Disney, Meta and Nielsen. Bridge player and sports fan. Thoughts are my own.