Artificial Intelligence series_part 3: Demystifying Pandas

The InSight lander took the picture view across Elysium Planitia, the vast lava plain near the equator of Mars using a camera mounted on its robotic arm after its touchdown on Monday (26/11/2018) at 0254 hours after taking a long six-month, 300 million mile (480 million kilometers) journey. (Photograph: Reuters)
This is third installment of the ongoing preparatory AI/ML content series.
The above image is one of the first pictures taken by InSight, short for Interior Exploration using Seismic Investigations, Geodesy and Heat Transport, is a Mars lander designed to give the Red Planet its first thorough checkup since it formed 4.5 billion years ago. It is the first outer space robotic explorer to study in-depth the “inner space” of Mars: its crust, mantle, and core. This mission is part of NASA’s Discovery Program for highly focused science missions that ask critical questions in solar system science.(source: MARS InSight Mission)
This post is part of my highly focused mission AI 😅 to learn and understand AI and maybe when this is all over then ask myself if artificial intelligence was after all just a fad! 💀 More importantly, take part in the discussions of how artificial intelligence will either save or destroy the world; Self-driving cars will keep us alive; social media bubbles will destroy democracy; robot toasters will rob us of our ability to heat bread. — Asking the Right Questions About AI

Practically everyone starting out in analytics space or almost any other data science related domain must learn to use Pandas library if you are choosing Python as your co-conspirator in finding truth. (By the way, I really really miss R. No love lost there yet). I really enjoyed creating the Jupyter notebook (embedded below) because of its powerful syntax and large set of easy to use functions for data analysis. This is a not-a- short introduction to Pandas and probably is all that you’ll need to get started. There’s definitely more to Pandas library than what is covered in this post. One can pick up on the numerous functions as one starts working on data sets using the library . There is no substitute for hands on experience. However, this post, if read once from start to finish, will get you familiarized with most of the nuances of the library and ready you to wrangle, manipulate, summarize data . Finally you can understand exactly what “pd” notification is doing everywhere. 😃.

Can we do without Pandas?

Well NOT!; because Pandas 🐼 are crucial for the survival of the bamboo forests of China(BBC and others have stressed the importance of these cute cuddly creatures which are on the verge of extinction). Wow! You’re already loving Pandas!!

pandas is probably the most popular Python library for data analysis . This library is a high-level abstraction over low-level NumPy library which is itself written in C. Pandas library for Python has become the gold standard for analyzing and modelling data in Python. Pandas is an open source, free to use (under BSD license), and it was originally written by Wes McKinney (link to his GitHub page).Unless you have a mental block of using R which till some time ago I had (too much of an R lover, hmm! 💕), one will need to rely on the Pandas library for anything that is related to data handling using Python. As per Wikipedia on the Pandas library, “the name is derived from the term “panel data”, an econometrics term for multidimensional structured data sets.” But I think it’s just an adorable name to a extremely-useful Python library!

Pandas has the following advantages which has made it the de facto go to library while analyzing data with Python:

  • Data Frame viz-a-viz traditional pythonic data structures such as list, dictionaries etc.
  • Easy to read-in and write-back data in many common tabular formats such as csv, Excel , SQL, json etc.
  • Rich functionality to deal with missing data
  • Super easy to restructure/reshape data
  • Easy indexing, slicing and assorting operations
  • The library has been written on top of NumPy which is primarily written in C which gives it a huge performance boost
    . . . and many more!

Final thoughts!

And with this, I will wrap up this small but savage introduction to sweet Pandas, but done with love. Congratulations! 👏

Pandas is a really dynamic, all too powerful and fun library for data manipulation / analysis, with easy to understand syntax and rapid procedure operations. This post (aka. the notebook)is a fairly exhaustive in content but still there is so much more to explore of the functionality that pandas provides. If you have gone through the notebook cell by cell, then WOW! That’s a lot of dedication( to digest 😍).For others, you need to do now is just practice! practice! practice! Share!, Share! Share!.

Adopt Pandas, Love Pandas, Save Pandas!

Happy Pandoraying !!

Next post will be covering data visualization aspects using Python.

Earlier blog posts:-