Top Resources for Starting Out in Data Science
We asked the Edge Team what resources they recommend for those starting out in data science
If you’re just getting started in data science, you may be overwhelmed by the many options for learning fundamentals and best practices. From textbooks, to online courses, to data camps, it can be hard to figure out where to start. We asked the Edge Team what resources they recommend and curated the following list based on their responses.
An Introduction to Statistical Learning
An Introduction to Statistical Learning text takes a less technical approach to key concepts in statistical learning including fundamental algorithms like regression, decision trees, clustering, and deep learning. It was written by the authors of the classic Elements of Statistical Learning out of a need to make the contents of Elements more accessible to a broader audience. The PDF is free online, and the associated lecture videos are also highly recommended.
Learning From Data: A Short Course
Learning From Data: A Short Course describes the fundamentals of machine learning, with a balanced approach of the theoretical and the practical, the mathematical and the heuristic. The book incorporates in-depth discussion of linear models, overfitting to stochastic and deterministic noise, and regularization.
Numerical Python
Numerical Python provides methods and case studies on how to numerically compute solutions and mathematically model applications. This text features case studies on computing techniques including array-based and symbolic computing, visualization, numerical file I/O, equation solving, optimization, interpolation and integration, and domain-specific computational problems.
Algorithms for Communications Systems and their Applications
Algorithms for Communications Systems and their Applications is a practical guide to applying algorithms in communications systems. Written for researchers and professionals in the areas of digital communications, signal processing, and computer engineering, this text presents algorithmic and computational procedures within communications systems that overcome a wide range of problems facing system designers.
The Data Science Primer
The Data Science Primer GitHub repository is a curated set online resources in the following topics: programming, linear algebra, statistics, probability, and SQL, and machine learning. The stated goal of the repository is “to provide an on-ramp to becoming a data scientist no matter someone’s background”.
DeepLearning.AI courses
Featuring courses from leading practitioners in machine learning, DeepLearning.AI provides excellent hands-on coursework. From one of the Edge team members: “DeepLearning.AI courses on Coursera are well worth the subscription cost, and the Jupyter-based assignments help you build strong intuitions as you build real deep learning models.”
CS231N: Convolutional Neural Networks for Visual Recognition
This is Fei-Fei Lei’s famous Stanford class on convolutional neural networks and computer vision. The university open-sourced the lectures and class notes. The class notes are especially great at explaining machine learning in an intuitive and technically sound way. Some consider it a must read for anybody wanting to get into ML.
Keras Examples
Keras Examples features short, focused demonstrations of vertical deep learning workflows in Python. Computer vision, natural language processing, generative deep learning, reinforcement learning, and graph data examples are all included. For machine learning specifically, we find that Keras Examples is a fantastic resource.
Kaggle
Kaggle provides code and data to learn data science and is famous for its hosted competitions. From their website: “Use over 50,000 public datasets and 400,000 public notebooks to conquer any analysis in no time.” From one of the Edge team members: “Kaggle is a great resource for ML tools. I’ve never taken part [in competitions], but I have looked through a lot of solutions.”
StatQuest with Josh Starmer
StatQuest with Josh Starmer is a YouTube channel with videos on countless concepts in statistics and machine learning. The information is presented in an approachable, ground-up way. It is a great resource to brush up on topics in ten minutes or less, and the stats-centric guitar jingles add to the fun.
Stack Overflow
Would any list of data science resources be complete without Stack Overflow?
Honorable Mentions
In addition to following Edge (@_EdgeAnalytics), there are countless practitioners and academics sharing valuable insights on Twitter. Specifically, we recommend following @svpino; one Edge team member describes his Twitter as “always having thought provoking ideas and 60 second brain teasers”.
Hands-on experience to build intuition
We like working on actual projects! Classes, videos, and books will only get you so far. Intuition is really important in data science, which you can only get by working through challenging problems in code.
The Edge Team
From one of the Edge team members: “I’ve learned a lot from talking to other members of the Edge team to understand how they operate and what kinds of tools they use.” Feel free to reach out with your data science questions!
What resources would you recommend for getting started in data science? Leave your picks in the comments!
Edge Analytics is a consulting company that specializes in data science, machine learning, and algorithm development both on the edge and in the cloud. We partner with our clients, who range from Fortune 500 companies to innovative startups, to turn their ideas into reality. Have a hard problem in mind? Get in touch at info@edgeanalytics.io.