Python for Data Science — Session Codest

Session Codest
MLthinkbox
Published in
3 min readFeb 7, 2022

In this post I provide an introduction to Python for data science, providing useful links and material to help get you started. Python is an incredibly flexible and powerful coding language that is rapidly becoming a critical business skill across many industries. It has real world application in some of the most popular websites and has an incredible community supporting the maintenance and development of new features.

For the purposes of data science, python represents a powerful foundational layer upon which an enormous suite of modules, packages and classes can be integrated to support novel and exciting analytical research.

What is python?

Since its release in the early 90s, python has risen to become one of the most popular coding languages in the world. Python’s prominence is driven significantly by its ease of use, extensive libraries of modules and classes and strong community support. Youtube, Google, Instagram, Spotify, Dropbox, and Quora are each examples of websites which were developed in python. These examples serve as evidence that python can be used as a backbone for heavily trafficked applications serving millions of users per day.

Python is often described as an interpreted high-level general-purpose programming language. Breaking this down piece by piece reveals: 1) “Interpreted” as it can communicate instructions directly with machines 2) “high-level” as it simpler and more understandable making use of more natural language elements while sometimes hiding or automating more complex programming requirements such as memory management, 3) “general purpose”, meaning that it can be used in a wide variety of applications. In our case we are leveraging the flexibility of python to develop rich data applications.

Below is a summary of the top 10 software languages according the the TIOBE programming community index. The TIOBE Programming Community index is an indicator of the popularity of programming languages. The index is updated once a month, for more information you can visit https://www.tiobe.com/tiobe-index/. At the time of this post (Feb 2022) python sits firmly in the top spot, with current trends suggesting that the language is gaining popularity and adoption. There is however another important data science language sitting in the top 10, namely, SQL — structured query language. In complex data applications, SQL can be integrated into python code to support the extraction and transformation of data within relational databases. I will be covering SQL in detail in future blog posts.

Applications in data science

For the purposes of data science, python represents a powerful foundational layer upon which an enormous suite of modules, packages and classes can be integrated to support novel and exciting analytical research

A typical data science project might be structured as per the below, whereby we use a Jupyter notebook as the visual interface for python, while loading packages such as pandas or pyspark to explore and interact with our data. We could then apply machine learning via Tensor flow of Spark Mlib to uncover data relationships and finally we could represent our outputs visually via matplotlib.

Next steps

Gaining a deep understanding of how python works is incredibly important for anybody wanting to build a career in data analytics, data engineering or data science. Getting started is incredibly straightforward and there is a lot of documentation and support to keep you frustration free for the most part. If you are new to programming in python then I highly recommend checking out the Best websites to learn python interactively and for free in 2022. Thanks for reading and as always, happy coding!

Originally published at https://www.sessioncodest.com on February 7, 2022.

--

--

Session Codest
MLthinkbox

Blogging about philosophy, big data and data science based on the learnings from my day-to-day coding sessions and problem solving exercises.