After we have a general overview of the dataset we are working with, the next step often involves selecting certain pieces of the data. The main way to make this selection is using the [ ], .loc and .iloc operators. In this article we will talk about the [ ] operator
Throughout this article we will discuss operator overloading.
Operator overloading is when an operator does different things depending on the datatype being passed to it.
A classic example of this is the ‘+’ operator. If passed two integers (5+7) it adds them and returns 12. If we pass two…

Pandas is an indispensable Python library that allows for loading, manipulating, and joining dataframes (think Excel sheets) within a Python environment.
To begin, you will need to download the Pandas library
pip install pandas
conda install pandas. #if you are using anacondaNext you will import the library using the alias pd
import pandas as pdTo begin you will load the dataset using the pd.read_csv(filename=’path’). There are many arguments that can be passed into this function, but for simplicity sake we will focus on the only one required: the filename. In our example our file is called ‘hepitatis.csv’ …

Streamlit is an open-source app framework for Machine Learning and Data Science teams. It allows programmers to create chic data apps in hours, that can be viewed and interacted with by non ‘coding-savvy’ users.
For my capstone project at Flatiron, I chose to build a climbing recommendation engine. I wanted to create a friendly user interface, so I chose to use Streamlit. I began this process by installing Streamlit.
pip install streamlitAfter that I created a new recommender.py file that I used to run my streamlit application. …

For my final project at Flatiron’s Data Science Immersive I decided to merge my two passions: coding and rock climbing.
I’ve often had climbing routes I enjoy and I’ll ask friends if they know of similar types of climbs. I figured, why not create a program that could do exactly that.
A/B testing is a method of comparing two webpages, apps, or interfaces to see which ones perform better. At its core, A/B testing is used to trial new changes and track the data of the users, to decide which version the company will move forward with.

Running a control group vs a variation allows companies to make data driver changes and business decisions with more confidence then ‘I think this way works better.
Collect Data — Decide what you want to look at changing on your website. Common examples include, bounce rate.
Identify Goals — decide what you will measure…

Background: The general idea for K-means is to sort your data points into meaningful groups. This is considered unsupervised learning since we are not ‘training’ the data using labeled, but rather looking to infer the natural structure of the data.
For this blog I will discuss how I used K-means to sort different images taken by a microscope. The microscope took around 8000 pictures spread across a 15x15 grid (225 points).
Big O notation is used to describe the performance and complexity of an algorithm(ie. a function or any piece of code). More specifically it is a measurement of the amount of time and space that it takes to run a specific piece of code.
As data scientist this is relevant to us as we will repeatedly be running/calling functions on large dataset. As the databases we use get bigger we will begin to notice if the code we are using does not run optimally. …

Cython is a library used to interact between C/C++ and Python. At its core, Cython is a superset of the Python language and it allows for the addition of typing and class attributes that can be translated to C code and C extensions in Python.
Compared to C/C++ pure Python code runs rather slow. However, most libraries such as Numpy, Pandas, and Scikitlearn are all C optimized which allows them to run faster. Some of these libraries are so optimized, that they can even run faster than what the average coding would create if coding directly into C.
Dynamic vs…

Data Scientist