Data Security for Data Engineers

Katharine Jarmul
97 Things
Published in
3 min readJun 6, 2019

Is your data safe? What about the data you process everyday? How do you know? Can you guarantee it?

These questions aren’t meant to send you running in fear — instead, I want you to approach security pragmatically. As data engineers, we’re often managing the most valuable company resource. For this reason, it only makes sense we learn and apply security engineering to our work.

How can we do so? Here are a few tips:

  1. Learn About Security: Most data engineers come from either a computer science or data science background — so you may not have had exposure to computer and network security concepts. Learn about these by attending security conferences, meetups and other events. Read up on security best practices for the particular architecture or infrastructure you use. Chat with the IT / DevOps or security folks at your company to hear what measures are already in place. I’m not asking you to become an expert, but I do want you to be informed.
  2. Monitor, Log and Test Access: Monitor, log and track access to the machines or containers you use, to the databases or other data repositories you maintain, and to the code and processing systems you contribute to daily. Make sure only credentialed users or machines can access these systems. Create firewall rules (yes, even in the cloud and even with containers) and test them using a port scanner or ping sweep. Monitor and alert on any unusual access or network behavior.
  3. Encrypt data: Making sure sensitive data is protected should be one of the key things we do as data engineers. Whenever possible, encrypt data or sensitive fields — both at rest and in-transit. According to IBM’s 2018 data breach report, this is one of the best ways to prevent a costly data breach.
  4. Automate Security Tests: Already using CI/CD as part of your data engineering? (If not, please stop reading and go do that right now.) Implement security tests as a part of that deployment. This can be as simple as testing bad credentials, testing for encryption, and testing that the latest security updates for your libraries are being used. The more you automate this testing and can stop and alert for any potential security threats, the safer your processing and pipelines will be.
  5. Ask for Help: If you are lucky enough to have a security team, ask them to help you assess the security of the processing infrastructure, networks and scripts. If not, see when your company’s next external security review is scheduled and ask for time to talk with the experts about what measures you can take to provide better security for your data engineering. This can include pen-testing of data collection endpoints or exposed APIs you use or maintain, or simply a security review of the processing deployment, monitoring and architecture. Either way, getting expert advice will likely make you feel more confident with the measures you have and those you want to prioritize or implement in the future.

For your particular role or company, there may be even more low-hanging security fruit — and scheduling a regular security sprint into your planning is a great way to stay on top of these issues and improve security over time. When faced with those questions again, you and your team can respond with ease of mind — knowing your data engineering workflows are secure.

--

--

Katharine Jarmul
97 Things

Head of Product at Cape Privacy. Helping make data privacy easier and more accessible for real world data science and machine learning. 😁