Shaping Data’s Future: Insights from a Collaborative Taiwan Meetup
A few weeks ago we hosted a meetup when our founder, CL, and his friend Tyson, co-founder of Tobiko Data, visited Taiwan. It was casual gathering for data professionals from around the world and we had great sharing and Q&A sessions. Thanks Hive Venture for providing such a nice place (we hope to host more activities like this, so stay tuned).
Highlights From The Meetup
Our speakers shared the projects they’re passionate about:
- CL Kao, founder & CEO of Recce, discussed pull requests (PRs).
- Tyson Mao, co-founder & CEO of Tobiko Data, introduced Tobiko.
- William Chang, co-founder & CTO of Canner, talked about Wren AI.
PR Challenges in New Software Like Data and AI — CL Kao
As CL discussed, “How has this been tested?” is a common section in the PR template as best practice. In traditional software, we write test cases and put them in CI. If the CI fails, we know something is wrong and can fix it. This means we know what “correct” looks like before the PR is created.
This approach doesn’t work on the new types of software like code-first data pipelines or LLM-base applications. We can test the SQL compilation, linting or prompt code, but we don’t know if the data or the prompt results are correct.
Currently, the best we can do on data is spot checks, like writing a custom query, pasting it into dev or stage environments, and manually eyeballing the results. 🤷♀️
Or worse, we simply don’t do any test when making changes to a prompt.
CL is passionate about software development workflow. He built SVK, a distributed version control system, 20 years ago before git existed. Recce — a data modeling validation tool — is designed to bridge this gap between data development workflow and software CI, starting with PR review in dbt projects, and is poised to build a new industry standard of Preview-Decide-Deploy in smart systems, similar to DevOps & CI/CD in traditional software.
More Efficient, Cost-Effective Data Transformation — Tyson Mao
Tyson shared that Tobiko Data has a globally distributed team with 24 people from FAANG. They build SQLMesh as an alternative to dbt. Their focus is on creating a faster, more cost-effective, and more collaborative solution.
Through SQLGlot and SQLMesh, two open source projects they built,
- They have Semantic understanding of SQL, analyze data changes, and detect whether the changes are breaking or non-breaking.
2. SQLMesh is stateful, relying on persistent state to store metatdata. You can know where you have left off and what others have done.
3. Don’t recompute something that has been done. Virtual Data Environments efficiently decide when (and when not) to materialize tables.
The benefits of this approach are huge! With it, teams can:
- Understand when and how models have been changed.
- Save time and money by knowing when (and when not) to rebuild models and to materialize tables. You don’t want to rebuild models when you’ve just added a space in the SQL snippet!
- Simplify debugging by understanding SQL across multiple warehouses. If you have a typo in your SQL, you want to know that before spending 45 minutes running the DAG.
Open Source SQL AI Agent — William Chang
“Can you add a column for me? It should be easy, like 5 minutes for you, right?” — William mentioned what we all know: this is a really common request in the daily life of data engineers and analysts. This is why Text-to-SQL has became a popular topic.
ChatGPT can write SQL for you. The challenges are how to understand the context: there are columns with named attributes 1 to 10 with no meaning, you have to know how to retrieve related data, and you need to know how to make sure the generated SQL is correct, etc.
In Wren AI, they collect and listen to user feedback deeply. They discovered that users lack trust in the results when they’ve only begun using it. To address this, they transparently explain how the SQL is generated in the AI responses. Notably, users primary focus on the AI and ask “What questions can I ask?” However, when Wren AI starts providing guidance to help users understand their data and explore potential directions, it’s adoption significantly increase.
They also got feedback that having a chart is easier, so they started working on Chart Generation (which was released just after this meetup! 🚀) And even more releases are expected throughout this year!
Stay Tuned and See You Next Time
These talks sparked a great Q&A session, covering topics like AI’s role in data work, the future of SQL, and the hurdles of building a data-focused startup. A particular hot topic, “Will AI replace data analysts?” had all three founders in agreement: while AI can assist with some tasks, SQL remains essential, and AI is still far from replacing humans in complex data modeling and decision-making.
This was a great meetup with lots of learning and conversations. We’d love to keep this momentum going, so stay tuned for more!
These talks sparked a great Q&A session, covering topics like AI’s role in data work, the future of SQL, and the hurdles of building a data-focused startup. A particular hot topic, “Will AI replace data analysts?” had all three founders in agreement: while AI can assist with some tasks, SQL remains essential, and AI is still far from replacing humans in complex data modeling and decision-making.
This was a great meetup with lots of learning and conversations. We’d love to keep this momentum going, so stay tuned for more!