A data structure at the heart of nearly all data science tools.

Photo by Henry & Co. on Unsplash

We’re all familiar with the standard Python list — a mutable object that has great flexibility in that not all elements of the list need to be of a homogeneous data type. That is, you can have a list containing integers, strings, floats, and even other objects.

my_list = [2, {'dog': ['Rex', 3]}, 'John', 3.14]

The above is a perfectly valid list containing multiple data types as elements — even a dictionary which contains another list!

However, to support all these simultaneous data types, each Python list element must contain its own unique information. Each element acts as a pointer…


An easy way to generate test data for any situation.

Photo by Markus Spiske on Unsplash

There are so many Python packages out there, and for people who are learning the language, it can be overwhelming to know what tools are available to you. I am looking to uncover some of the lesser-known, yet functional and useful packages to help you on your Python journey.

According to their documentation, Faker is a ‘Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you.’

This immediately appealed to me as…


Making the most of your interactive Python notebooks

Photo by Dan Smedley on Unsplash

Documentation

Often times you will need to reference the documentation for a particular function or object that you are working with. Rather than having to use Google, you can directly access a function’s docstring from within your notebook using the shorthand ‘?’ command.
Example, if I were curious to know more about Python’s built in range() function I would write the below, which produces the following output:


Comparison of two common libraries and their performance.

Photo by Austin Distel on Unsplash

What Is Sentiment Analysis?

For most businesses, knowing what their customers feel about their product/service is extremely valuable information which can be used to drive business improvements, changes of process, and ultimately increase profitability.
Sentiment analysis is a process by which information is analyzed through the use of natural language processing (NLP) and is determined to be of negative, positive, or neutral sentiment.

To outline the process very simply:
1) Tokenize the input into its component sentences or words.
2) Identify and tag each token with a part-of-speech component (i.e., noun, verb, determiners, sentence subject, etc).
3) Assign a sentiment score from -1 to 1.
4) Return score…


A Deep Dive Into Scatter Plots with Plotly

Photo by Isaac Smith on Unsplash

In my last article on Plotly, we did a quick introduction to the library itself — covering the general structure of a graph object, using traces, and making customization's through the update_trace and update_layout functions.

Today we’re going to take a more intensive look at building and customizing scatter plots using the Plotly library.

For our dataset today, I have selected a Steam Games dataset from Kaggle, and I am interested in seeing if there is any relationship between the price of the game and the average playtime.

Loading into a dataframe, we can see the head as below:


Photo by Luke Chesser on Unsplash

While learning Python, I found countless articles going over the two core graphing libraries — Matplotlib and Seaborn.
However, I found that they lacked some of the ‘professional’ level aesthetic that I was looking for.

I ultimately found Plotly (and Dash) and quickly realized the power, flexibility, and overall ease of use.

In this introductory article, I want to give you an overview of the Plotly library using a sample Kaggle dataset.
I will do my coding via Google Colab notebook. If you are not familiar with Colab, you can refer to an overview article written here.

What is Plotly?

Plotly Python is…


What, Why, and How

Photo by Kelvin Ang on Unsplash

Google Colaboratory is a great platform for those who are:
- Looking to create a machine learning model but lack a strong enough CPU/GPU.
- Want the ease and simplicity that comes with a cloud based notebook which requires no setup.

So, What is Google Colaboratory, and Why Should I Use It?

Simply put, Google Colab is a free to use, cloud based version of a Jupyter notebook where the only requirement is a web browser.

Your code is run on a virtual machine hosted by Google, so you don’t need to rely on your PC’s CPU or GPU for any of the heavy lifting, thus making Colab perfect for running…

Bryan White

Supply Chain Analyst and Data Science Student at the University of Auckland.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store