Where analytical thinking will take you a long way in data science.
A summary of discussions and recommendations from Anthill Inside open house on 1st September.
On Friday, 1st September, we hosted an open house at HasGeek on learning journeys with mathematics and statistics.
The curation of this open house was motivated by conversation with members of our data science and deep learning communities — The Fifth Elephant and Anthill Inside — who spoke to us about wanting to understand concepts and applications of math.
Three questions served as anchors for the discussion:
1. I have an **interest** in data science, how much Math do I need to know?
This question comes from people who have a (business) problem, and they think data science can help them solve it. They want to go from point A to point B (to solve a problem) and want to learn how to use the car (i.e., data science) for getting there.
2. I want a **job** in data science, how much Math do I need to know?Where developers are looking for changing jobs, they want to know how much Math they need to know to get past interviews. This is not an easy question.
3. I want a **career** in data science, how much Math do I need to know?
This question is from people who have done fair amount of data analysis, BI or ML, and are sure they want to build a career in this field. The question here is not should I learn the math, but that there is a lot of math to learn — should they go wide and learn math behind many algorithms, or should they go deep and pick up one focus area — images, text, or a class of problems like recommendation?
Here is the full video of the discussion:
Below are five summary points from the discussion:
- If you want to start with tools, get a handle on Excel. Learning to program in Excel is a sure shot way to get your fundamentals right (and become a functional programmer).
- The other sure shot way to get started is to literally get your hands dirty — work on a public / open dataset, start with solving a problem. This is the hacker’s way of learning and helps you peel layers of the data science onion, head-on.
- Inquisitive, descriptive, predictive and causal are the four buckets of problems you can solve with data science. Pick your bucket, but know that causal is the hardest of all to solve.
- While we may harp about tools, and jobs are about knowing how to apply the tools, you first need to understand the problem at hand. Tools are secondary.
- Following from #4, develop analytical thinking. It precedes tools and concepts .Data science techniques and knowledge of math are superior techniques which you can apply successfully (and unsuccessfully) as you develop your analytical thinking faculties.
Here are some of the references to get started with your journey:
- A mathematics course for political and social research.
- Thinks Stats.
- Think Bayes
- Hadley Wickham’s “R for datascience”.
- http://www.oreilly.com/data/free/
- http://www.learndatasci.com/free-data-science-books/
For linear algebra and calculus:
- Essence of Linear Algebra: This is a beautiful introduction to the core concept in linear algebra using geometry as opposed to symbols. https://www.youtube.com/watch?v=kjBOesZCoqc&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab
- Essence of Calculus — By the same author, understanding the basics of calculus visually https://www.youtube.com/watch?v=WUvTyaaNkzM&list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr
- Immersive Linear Algebra: An interactive web-book for learning linear algebra with both the symbols as well as geometry http://immersivemath.com/ila/
- Invitation to another dimension: Understanding math behind dimensions and equation solving https://maxgoldste.in/invitation-to-another-dimension/
For statistics:
- Statistics for hacker: A wonderful intro talk by Jake Vanderplas to understanding the core concepts of computation stats — generation, shuffling, bootstrapping and cross-validation using code (python) https://speakerdeck.com/jakevdp/statistics-for-hackers
- Seeing Theory: A visual introduction to probability and stats. http://students.brown.edu/seeing-theory/
- p-hacking: A very nice introduction to understand why everyone talks about p-values and how it can be easily manipulated. https://fivethirtyeight.com/features/science-isnt-broken/#part1
- Visualising Histogram: Everything you ever need to know about histograms and interactive too. http://tinlizzie.org/histograms/
For algorithms and ML:
- Explained Visually: This a great visual introduction to many of the basic algorithms behind ML — OLS, PCA, Kernels, Markov Chains http://setosa.io/ev/
- Distill: Check out interactive explanations behind many of the parameters in ML algorithms e.g. perplexity in t-sne or momentum https://distill.pub/
- Visual explanation in the blog of the creator of distill — http://colah.github.io/
- Hackermath: This covers basics of linear algebra, optimisation and statistics and their application to A/B Testing, Supervised learning and Unsupervised learning. It is obviously very visual and used python code to show everything. https://github.com/amitkaps/hackermath
Godspeed, and write to us about how your journey is progressing — info@hasgeek.com. We’d love to hear about!
[Thanks to Amit Kapoor for loaning his words for this blog post, and for compiling some of the references.]