Simon Systems
Published in

Simon Systems

The Ins and Outs of Data Transfer

Data transfer is a fundamental concept to Simon. At the core of our system, we load data from core customer systems and then ultimately transfer data out to marketing, sales, and support channels. Jason gave a talk yesterday at an AWS meetup in Santa Monica and provided a mile-high view of the topic.

So much of the big data ecosystem assumes that your data is just “there”. In reality, your data is both coming from somewhere and then ending up somewhere else.

ETL problems are pervasive and many people are tackling them without even knowing that they’re tackling them. If you’re building a business dashboard, you’re doing ETL. If you’re setting customer records programmatically into Salesforce. If you’re syncing a customer segment into a Facebook custom audience, you’re joining customer behavior and then syncing it out to the web.

ETL is inherently brittle and hard to test. This includes source breakage (database connection timeouts), transform limitations (out of memory errors during computation), and loading issues (API connection issues, rate limiting). You have to set expectations that things are going to break, and you need to have strong expectations on how they’re going to break.

Finally, like any software engineering discipline, basic programming principles can be employed to maximize reliability of these processes. Testing, logging, graphing, and system idempotency are critical.

See slides here for more detail — enjoy!




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dr. Jason Davis

Dr. Jason Davis

Co-founder @ Simon Data, Entrepreneur, Data Geek

More from Medium

Data Lake and the Blockchain

Ethics and Bias in Data Processing Algorithms

How can we marry ethics and data processing automation to safeguard diversity and inclusion priorities, while capitalizing on efficiencies?

Retail Data Analytics with Oracle Data Lakehouse and OML

Can data quality issues be exploited as opportunities?