Explore Open Source Environments/IDEs in Python for Machine Learning & Data Science

Honey Bansal
7 min readAug 4, 2018

--

Let’s go ,discuss features and explore some of the popular open source environments /IDE’s in the field of data science available in the market !!!!

Image Credits : The Edited logo Image

What is ANACONDA?

Anaconda is a free and open source distribution of the Python and R programming languages for data science and machine learning related applications (large-scale data processing, predictive analytics, scientific computing), that aims to simplify package management and deployment. Package versions are managed by the package management system conda. The Anaconda distribution is used by over 6 million users, and it includes more than 250 popular data science packages suitable for Windows, Linux, and Mac OS.

Image Credits: The Official Logo

Anaconda distribution comes with more than 1,000 data packages as well as the Conda package and virtual environment manager, called Anaconda Navigator, so it eliminates the need to learn to install each library independently.

The open source data packages can be individually installed from the Anaconda repository with the conda install command or using the pip install command that is installed with Anaconda. Pip packages provide many of the features of conda packages and in most cases they can work together.

You can also make your own custom packages using the conda build command, and you can share them with others by uploading them to Anaconda Cloud, PyPI or other repositories.

The default installation of Anaconda2 includes Python 2.7 and Anaconda3 includes Python 3.6. However, you can create new environments that include any version of Python packaged with conda.

What is CONDA?

Conda is an open source, cross-platform, language-agnostic package manager and environment management system that installs, runs, and updates packages and their dependencies. It was created for Python programs, but it can package and distribute software for any language (e.g., R), including multi-language projects. The Conda package and environment manager is included in all versions of Anaconda, Miniconda, and Anaconda Repository.

What Is Anaconda cloud?

Anaconda Cloud is a package management service by Anaconda where you can find, access, store and share public and private notebooks, environments, and conda and PyPI packages. Cloud hosts useful Python packages, notebooks and environments for a wide variety of applications. You do not need to log in or to have a Cloud account, to search for public packages, download and install them.

WHAT IS JUPYTER NOTEBOOK?

Image Credits : The official Jupyter Logo

The Jupyter Notebook enables users to create and share documents that combine live code with narrative text, mathematical equations, visualizations, interactive controls, and other rich output. It also provides building blocks for interactive computing with data: a file browser, terminals, and a text editor.

The Jupyter Notebook has become ubiquitous with the rapid growth of data science and machine learning and the rising popularity of open-source software in industry and academia:

· The Jupyter Notebook now supports over 100 programming languages, most of which have been developed by the community.

· There are over 1.7 million public Jupyter notebooks hosted on GitHub.

At the same time, the community has faced challenges in using various software workflows with the notebook alone, such as running code from text files interactively. The classic Jupyter Notebook, built on web technologies from 2011, is also difficult to customize and extend.

WHAT IS JUPYTER LAB?

JupyterLab is an interactive development environment for working with notebooks, code and data. Most importantly, JupyterLab has full support for Jupyter notebooks. Additionally, JupyterLab enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.

JupyterLab enables you to arrange your work area with notebooks, text files, terminals, and notebook outputs.

JupyterLab provides a high level of integration between notebooks, documents, and activities:

· Drag-and-drop to reorder notebook cells and copy them between notebooks.

· Run code blocks interactively from text files (.py, .R, .md, .tex, etc.).

· Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work.

· Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more.

To get started, see the JupyterLab documentation for installation instructions and a walk-through, or try JupyterLab with Binder. You can also set up JupyterHub to use JupyterLab.

What is qt console?

Image Credits : The Official I-Python Logo

The Qt console is a very lightweight application that largely feels like a terminal, but provides a number of enhancements only possible in a GUI, such as inline figures, proper multi line editing with syntax highlighting, graphical call tips and more.

What is SPYDER?

Image Credits: The official Spyder Logo

Spyder is an open source cross-platform integrated development environment (IDE) for scientific programming in the Python language. Spyder integrates with a number of prominent packages in the scientific Python stack, including NumPy, SciPy, Matplotlib, pandas, IPython, SymPy and Cython, as well as other open source software

· An editor with syntax highlighting, introspection, code completion

· Support for multiple IPython consoles

· The ability to explore and edit variables from a GUI

· A Help pane able to retrieve and render rich text documentation on functions, classes and methods automatically or on-demand

· A debugger linked to IPdb, for step-by-step execution

WHAT IS ORANGE?

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative data analysis and interactive data visualization, and can also be used as a Python library. The program provides a platform for experiment selection, recommendation systems and predictive modeling and is used in biomedicine, bioinformatics, genomic research, and teaching. In science, it is used as a platform for testing new machine learning algorithms and for implementing new techniques in genetics and bioinformatics. In education, it was used for teaching machine learning and data mining methods to students of biology, biomedicine and informatics.

WHAT IS GLUEVIZ?

Glue is a Python library to explore relationships within and among related datasets. Its main features include:

Linked Statistical Graphics. With Glue, users can create scatter plots, histograms and images (2D and 3D) of their data. Glue is focused on the brushing and linking paradigm, where selections in any graph propagate to all others.

· Flexible linking across data. Glue uses the logical links that exist between different data sets to overlay visualizations of different data, and to propagate selections across data sets. These links are specified by the user, and are arbitrarily flexible.

· Full scripting capability. Glue is written in Python, and built on top of its standard scientific libraries (i.e., Numpy, Matplotlib, Scipy). Users can easily integrate their own python code for data input, cleaning, and analysis.

WHAT IS R-STUDIO?

Image Credits : The Official R-Studio Logo

R-Studio is a free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and graphics. RStudio was founded by JJ Allaire, creator of the programming language ColdFusion. Hadley Wickham is the Chief Scientist at RStudio.

RStudio is available in two editions: RStudio Desktop, where the program is run locally as a regular desktop application; and RStudio Server, which allows accessing RStudio using a web browser while it is running on a remote Linux server. Prepackaged distributions of RStudio Desktop are available for Windows, macOS, and Linux.

RStudio is available in open source and commercial editions and runs on the desktop (Windows, macOS, and Linux) or in a browser connected to RStudio Server or RStudio Server Pro (Debian, Ubuntu, Red Hat Linux, CentOS, openSUSE and SLES).

RStudio is written in the C++ programming language and uses the Qt framework for its graphical user interface.

WHAT IS VSCODE?

Image Credits: Microsoft.com

Visual Studio Code is a source code editor developed by Microsoft for Windows, Linux and macOS. It includes support for debugging, embedded Git control, syntax highlighting, intelligent code completion, snippets, and code refactoring. It is also customizable, so users can change the editor’s theme, keyboard shortcuts, and preferences. It is free and open-source, although the official download is under a proprietary license.

Visual Studio Code is based on Electron, a framework which is used to deploy Node.js applications for the desktop running on the Blink layout engine. Although it uses the Electron framework, the software does not use Atom and instead employs the same editor component (codenamed “Monaco”) used in Visual Studio Team Services (formerly called Visual Studio Online).

--

--

Honey Bansal
0 Followers

Highly enthusiast and passionate Data scientist & ML Engineer.