Data Science Checklist

General Assembly — Data Science (Software Checklist)

Hi. Welcome to Data Science at General Assembly. This is a “comprehensive” list of software, libraries, and dependencies that you will use in the GA Data Science course, LA.

Requirements

Generally, you’ll want a Mac laptop for this course. You can make do with a Windows laptop, but it is strongly recommended that you dual boot with Linux.

If you have a mac, you will need to upgrade to OSX Mavericks. This is not optional — a lot of the software listed below will work suboptimally with an out-of-date operating system. You will need sudo access (i.e. administrator privileges) to your computer.

If you have Windows, it is strongly recommended that you use Windows 7, or XP. Windows 8 users have issues with cygwin, PowerShell, and many other tools. You must also know how to set your PATH variable.

If you’re running any flavor of Linux (except maybe ArchLinux) you’re set! I recommend using Fedora, Ubuntu, or CentOS. Students new to the world of Linux are encouraged to use Ubuntu.

Software

  1. Python 2.7. (any version, 2.7.x). If you have Python 3.0, please downgrade.
  2. R. (http://www.r-project.org/)
  3. MySQL Workbench, or Sequel Pro.
  4. Java 7+
  5. An FTP Client of your choice (FileZilla is free). I use Transmit.
  6. (Mac Only): XCode Command Line Tools Package
  7. A 3-way merge tool that works with Git. I use KDiff3.
  8. Gnuplot
  9. Sublime Text 2 / emacs / Text editor of your choice.
  10. (Mac Only): MAMP (Do not bother with MAMP Pro).
  11. git (Please use & install the command line version).

Libraries / Package Managers / Other

You may find the following libraries easier to install using package managers like homebrew, pip, and/or easy_install. On Linux, use apt-get or yum, depending on your distro.

  1. (mac — package manager) homebrew
  2. (win/mac — package manager) pip
  3. (win/mac — package manager) easy_install
  4. iPython Notebook
  5. pandas (Python Data Analysis Library)
  6. Scikit-Learn
  7. Scrapy
  8. Beautiful Soup 4
  9. Networkx
  10. MySQLdb
  11. numpy
  12. scipy
  13. pylab
  14. pyplot

Optional

  1. D3.js
  2. Julia (stats / pythonic lib)
  3. tmux
  4. Any Python IDE.
  5. Dygraphs
  6. HiChart
  7. PHP / Apache
Show your support

Clapping shows how much you appreciated S.I. Yasin Dara’s story.