PyCharm for Data Scientists
I have recently started using PyCharm as an alternative to Spyder, and am loving it. This article talks about some of the features of PyCharm that made me completely transition to PyCharm from Spyder. The below features are in comparison with Spyder, and not general IDEs.
- Terminal: You get a shell script right in the IDE, which helps easy access to several functionalities such as — testing your script from command line, downloading data files from gs/aws, git interaction.
- Git: Talking about git, PyCharm offers git integration. Just from the IDE, you can add what all files needs to be added to gitignore, add, and commit. You can always view which branch you are on, which files are in modified, add, commit states (based on their color in project view) etc. Did I mention we can directly do git push from the Terminal present within IDE?
- Version Control: You can view your git change-log from within the IDE. You can also compare your file with the latest file from git commit, to view your changes.
- Plugins: PyCharm offers a lot of plugins to non-pythonic files too. So when you work with config files such as yaml/json/ini, or with shell scripts, or with sql/html/css files, the IDE knows what format is expected and does the indentation, highlighting keywords etc out-of-box. You can even work with iPython notebooks, though frankly it seemed messy and I would just go with Jupyter for notebooks. A few other noteworthy plugins are git, flask, vim etc.
- Project maintenance: When you are working on large projects, there are several best-practices that need to be maintained, such as creating a readme file, using virtual environments, managing requirements file etc. All these are handled quite easily with PyCharm. You can also easily do code-refactoring, check dependencies of an object, trace an object’s source etc very easily.
- Debugger: This is present in Spyder too, but somehow I never used it with Spyder. You can create debug points and check your code behavior at that point.
- Running code in chunks: This is something that is very natural in Spyder. You select the part of code you want to run, and just do a cmd+return. This is not so visibly direct in PyCharm, but is available (Option + Shift + E in Mac). This comes very handy while writing stand-alone scripts.
With all these benefits, there are a few downsides as well:
- Memory hog: PyCharm consumes around 1.5GB of my Mac’s memory. This seems to be a lot for an IDE
- Data visualization: Spyder’s greatest strength for me is its variable explorer. You can visualize the dataframes in PyCharm too, but its nowhere close to Spyder’s. Same with plots as well. The plot rendering in PyCharm takes significantly more time compared to Spyder.
- Learning Curve: I think Spyder‘s popularity with Data Scientists comes from the fact that it is very simple. You install it, open it, and you just know how it works. But PyCharm has a bit of learning curve, such as setting up its interpreter, figuring out how to run your selection of code and not the whole code etc. You need to spend a few hours to really understand and appreciate its workings.
Ideally, if you are working on a project where you will work on multiple scripts interacting with each other, you should definitely try PyCharm. If all your scripts are stand-alone analyses, Spyder (even better - Jupyter) should be sufficient for your needs.