Must — know differences in Data Science

Anushka Bajpai
4 min readJul 5, 2022

--

“Concepts are the keystone in the architecture of our thinking” — Paul Hughes

Photo by Sharon Pittaway on Unsplash

1. Bias vs Variance

source

2. Bagging vs Boosting

source

3. Random Forest vs Decision Trees vs Bagging

Source

Random Forests add more randomness, as sampling is done at node level as compared to Bagging where a fixed set of features are a given base model (model level)

Hence, Bagging with underlying algorithm being decision trees is not the same as Random Forest.

4. Standardization vs Normalization

5. K-means vs Hierarchical Clustering vs K-means ++

6. K-means vs KNN

source

7. Parametric vs Non parametric

8. Precision vs Recall

Recall / Sensitivity : Of all the predicted positive cases by the model, how many were actually positive

Precision : Of all the positive cases, how many were correctly predicted

Precision- Recall trade-off

Source

9. Mean-squared error vs Mean absolute error

10. R-square vs Adjusted R-square

11. Ridge vs Lasso Regularization

12. Parameters vs Hyperparameters

Source

13. Batch vs Online Learning

source

14. Standard Deviation vs Inter Quartile Range

Source

15. Correlation vs Causation vs Covariance

Source
https://statanalytica.com/blog/correlation-vs-causation/

16. Probabilistic vs Non-Probabilistic Sampling

Source

17. Overfitting vs Underfitting

source
source

18 . fit vs fit_transform

19. Type I vs Type II error

20. Supervised vs Unsupervised Leaning

source

21. Z-test vs T-test

Source

Final Thoughts

Sometimes even simple terms and concepts get blurred with time while we keep focusing on more advanced stuff. It is thus crucial to have a compilation these basic foundational concepts while we continue to explore further

Hope this article helped!

Please feel free to add in your thoughts and suggestions (if any)

Happy learning!

Cheers :)

--

--