TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial…

Member-only story

Engineering best practices for Data Science projects

Nikita sharma
TDS Archive
Published in
5 min readNov 7, 2020

--

Photo by Jon Tyson on Unsplash

Introduction

In this post, we will learn some best practices to improve our code quality and reliability for the production Data Science code.

Note: Most of the things mentioned here are not new to the Software engineering world, but they often get ignored/missed in the experimental world of Data Science.

Here in this post, I will briefly mention the topics and things we can do to make our project more reliable and I will create a few follow-up posts to describe each of these steps in more detail using a project example. Also, I will be assuming a Python (pyspark) Data Science project for this post, but the ideas can be applied to any other programming language or project.

Hope you find them useful.

Code Refactoring

This is the first step for having better code. It is the process of simplifying the design of existing code, without changing its behavior.

Data science projects are written on jupyter notebooks most of the time and can get out-of-control pretty easily. A code refactoring step is highly recommended before moving the code to…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Nikita sharma
Nikita sharma

Written by Nikita sharma

Data Scientist | Python programmer

Responses (1)