Software Engineering in Data Science
Just a quick background of me, I started to learn programming with Python but not for data science, so I learned the software engineering practices for Python. When I say software engineering practices, I mean writing reusable and readable code. When learning to become a Data Scientist, you are thrown straight into Jupyter Notebooks to learn Python and it’s packages that are useful for data analysis, machine learning, statistical modeling, etc. This is adequate but for the most part, Jupyter Notebooks allows you to be messy and write code that can’t be interpreted or reusable.
Jupyter Notebooks doesn’t teach software engineering practices because of the cells. The cells allow you to reassign variables above as the exact same variable already created previously. If you never restart the kernel and later on try to run a cell you notice that your variable is assigned to a different value. Whereas if you learn to use Python in a traditional IDE, you will learn the way Python assigns variables and instantiates objects. Also, that the order matters because when you run a script in an IDE, the code runner starts all the way at the beginning of the script, not just that section of code. This can lead to errors that you can’t figure out just because of a simple concept that a student can’t conceptualize when learning Python through Jupyter Notebooks.

Although it may be seemed pretty simple to fix the problem here because there are 4 lines of code, imagine if there were hundreds or thousands of lines of code. The amount of time it would take searching and debugging to find the desired value for that variable.
Not to put my classmate on blast, but we were going over each other's code. I noticed that he/she had so many places where he/she could have just made a function and called that function for however many times he/she needed that process to be repeated. Jupyter Notebooks enables this indirectly due to one of the benefits of Jupyter Notebooks, you can run just a snippet of the code and show the output of it and not have all the outputs shown before that cell. It almost makes you both not lazy and lazy. It makes you not lazy because you are willingly to write repeating code multiple times and it makes you lazy because you just don’t want to think of how you would put a process into function form.
Jupyter Notebooks teaches new programmers incorrect practices and lets us get away with them because of its benefits. I am not saying Jupyter Notebooks is bad and no one should use it. There are many benefits that come out of Jupyter Notebooks, like being very presentable by using markdown cells and images and being very dynamic, but being very dynamic brings its problems for a person who doesn’t know Python outside of Jupyter Notebooks. What I am saying is that if you are going to learn Python for data science take the time to learn about Python and use other IDEs such as VS Code, PyCharm, Spyder, Atom and many many more.