38 Important Articles Every Data Scientist Should Read

“The more that you read, the more things you will know. The more that you learn, the more places you’ll go.” - Dr. Seuss, I Can Read With My Eyes Shut!

2 min readJan 4, 2017

Originaly posted by Mirko Krivanek, on Data Science Central, this list contains both external and internal papers, focusing on various technical aspects of data science and big data.

*Complex Open Text Analysis: Source:* *Avinash Kaushik*

External Papers

Bigtable: A Distributed Storage System for Structured Data
A Few Useful Things to Know about Machine Learning
Random Forests
A Relational Model of Data for Large Shared Data Banks
Map-Reduce for Machine Learning on Multicore
Pasting Small Votes for Classification in Large Databases and On-Line
Recommendations Item-to-Item Collaborative Filtering
Recursive Deep Models for Semantic Compositionality Over a Sentimen…
Spanner: Google’s Globally-Distributed Database
Megastore: Providing Scalable, Highly Available Storage for Interac…
F1: A Distributed SQL Database That Scales
APACHE DRILL: Interactive Ad-Hoc Analysis at Scale
A New Approach to Linear Filtering and Prediction Problems
Top 10 algorithms on Data mining
The PageRank Citation Ranking: Bringing Order to the Web
MapReduce: Simplified Data Processing on Large Clusters
The Google File System
Amazon’s Dynamo

DSC Internal Papers

How to detect spurious correlations, and how to find the …
Automated Data Science: Confidence Intervals
16 analytic disciplines compared to data science
From the trenches: 360-degree data science
10 types of regressions. Which one to use?
Practical illustration of Map-Reduce (Hadoop-style), on real data
Jackknife logistic and linear regression for clustering and predict…
A synthetic variance designed for Hadoop and big data
Fast Combinatorial Feature Selection with New Definition of Predict…
Internet topology mapping
11 Features any database, SQL or NoSQL, should have
10 Features all Dashboards Should Have
Clustering idea for very large datasets
Hidden decision trees revisited
Correlation and R-Squared for Big Data
What Map Reduce can’t do
Excel for Big Data
Fast clustering algorithms for massive datasets
The curse of big data
Interesting Data Science Application: Steganography

NOTE: Feel free to add your favorites an thoughts and remember to share!

38 Important Articles Every Data Scientist Should Read

“The more that you read, the more things you will know. The more that you learn, the more places you’ll go.” - Dr. Seuss, I Can Read With My Eyes Shut!

External Papers

DSC Internal Papers

Written by Oliver O. Makonjio