Installation Instructions (Mac)

Data Science — General Assembly (Yasin Dara)


Instructions for Macs, only.
Step 1: Install Homebrew (Instructions: http://brew.sh/)

ruby -e “$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"

Do not forget to run $ brew doctor after you finish the installation, to learn about potential issues you may have.

Step 2: Install git using homebrew

brew install git

Step 3: Update homebrew

brew update

Step 4: Install mysql using homebrew

brew install mysql

Step 5: Confirm that you have mysql installed on your mac, by typing

mysql

at the terminal. If you get an error such as “command not found”, you will need to configure your PATH variable, as follows:

$ sudo PATH=/usr/local/mysql/bin/:$PATH

Step 6: Once you have mysql and homebrew installed, you will need to confirm that you have pip (a python based package manager) on your computer. To do so, type

pip

at the terminal. If you get a “command not found” error, follow the instructions here to get pip installed.

Step 7: Install virtualenv for python (a python virtual environment). It is recommended that you read the blog post here, if you haven’t worked with virtual environments in python before.

sudo pip install virtualenv

Navigate to a directory of your choosing, and remember the path of that directory. You will be using this as your working folder for assignments in this class.

Now, initialize virtualenv in this directory:

virtualenv env

Step 8: Using git, which you previously installed, obtain the requirements.txt file from the class repository. I will not be providing instructions for this step, because learning to use git (even if you’re learning it on the fly right now) is definitely essential for data scientists and engineers. If you’re sneaky about it and you don’t want to clone the class repository for some reason, you can use wget or curl to obtain the file.

The file requirements.txt is located in the DS-LA-03/src/lesson01 folder.

Here is a link to the class github repository:

https://github.com/ga-students/DS-LA-03/

Step 9: In preparation for installing a python virtual environment.

If you have OSX Mavericks and are using the virtual env, please run:

Because the numpy installer is broken for the latest version of OSX (something to do with XCode 5.1). 
sudo ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future env/bin/pip install --upgrade numpy

numpy has been removed from the requirements.txt file.

Install numpy generically, systemwide:

sudo ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future pip install --upgrade numpy

This is usually where the @!$% hits the fan. Move the requirements.txt file relative to your python virtualenv env directory, or ensure that you know the path to the requirements.txt file.

Attempt to install requirements for the virtual environment:

sudo ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future env/bin/pip install -r requirements.txt

At this point you may deviate from these instructions at will to fix errors on your machine. I have designed this virtual environment with OSX Mavericks 10.9.2 installed on my machine. I’ve tested it on OSX Version 10.7.5 and higher, and it seems to work.

Ultimately, you are responsible for ensuring that all the packages in the requirements.txt file install properly on your computer. Google, Stack Overflow, your peers, and I, will help you along the way.

Important: If you wish to not use the virtual environment, that’s fine. You can find a complete list of python dependencies that need to be installed on the presentation PDF for lesson 1, on the course repository.

Step 10: If (and only if) you are on OSX Mavericks you will need to install sci-kit learn separately, because of issues with the way Mavericks handles gcc compiler flags. Run:

export CFLAGS=-Qunused-arguments
export CPPFLAGS=-Qunused-arguments

then run

sudo -E pip install -U scikit-learn

to install sci-kit learn.

Step 11: You will also need to install other software on your computer.

Obtain an FTP Client. Your choice. I use Transmit, but it’s not free.
Obtain MySQL Workbench.
Obtain R. I use the Berkeley mirror.

Step 12: Begin the exercise here:

https://github.com/ga-students/DS-LA-03/wiki/Lesson-01-Command-line-tutorial

Email me when S.I. Yasin Dara publishes or recommends stories