Machine Learning with R

Amazing ML libraries to use in R

Zoshua Colah
Data Science Library
7 min readNov 8, 2018

--

Image result for machine learning r
Source: https://cdn-images-1.medium.com/max/1200/1*zkCV5S7wgkghdp0r5DNWLw.png

The no-nonsense guide to Machine Learning libraries to use in R

sourced from: https://github.com/qinwf/awesome-R

  • AnomalyDetection - AnomalyDetection R package from Twitter.
  • ahaz — Regularization for semiparametric additive hazards regression.
  • arules — Mining Association Rules and Frequent Itemsets
  • bigrf — Big Random Forests: Classification and Regression Forests for Large Data Sets
  • bigRR — Generalized Ridge Regression (with special advantage for p >> n cases)
  • bmrm — Bundle Methods for Regularized Risk Minimization Package
  • Boruta — A wrapper algorithm for all-relevant feature selection
  • BreakoutDetection- Breakout Detection via Robust E-Statistics from Twitter.
  • bst — Gradient Boosting
  • CausalImpact- Causal inference using Bayesian structural time-series models.
  • C50 — C5.0 Decision Trees and Rule-Based Models
  • caret - Classification and Regression Training
  • Clever Algorithms For Machine Learning
  • CORElearn — Classification, regression, feature evaluation and ordinal evaluation
  • CoxBoost — Cox models by likelihood based boosting for a single survival endpoint or competing risks
  • Cubist — Rule- and Instance-Based Regression Modeling
  • e1071 — Misc Functions of the Department of Statistics (e1071), TU Wien
  • earth — Multivariate Adaptive Regression Spline Models
  • elasticnet — Elastic-Net for Sparse Estimation and Sparse PCA
  • ElemStatLearn — Data sets, functions and examples from the book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman
  • evtree — Evolutionary Learning of Globally Optimal Trees
  • forecast — Timeseries forecasting using ARIMA, ETS, STLM, TBATS, and neural network models
  • forecastHybrid — Automatic ensemble and cross validation of ARIMA, ETS, STLM, TBATS, and neural network models from the “forecast” package
  • prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
  • FSelector — A feature selection framework, based on subset-search or feature ranking approches.
  • frbs — Fuzzy Rule-based Systems for Classification and Regression Tasks
  • GAMBoost — Generalized linear and additive models by likelihood based boosting
  • gamboostLSS — Boosting Methods for GAMLSS
  • gbm — Generalized Boosted Regression Models
  • glmnet - Lasso and elastic-net regularized generalized linear models
  • glmpath — L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model
  • GMMBoost — Likelihood-based Boosting for Generalized mixed models
  • grplasso — Fitting user specified models with Group Lasso penalty
  • grpreg — Regularization paths for regression models with grouped covariates
  • h2o - Deeplearning, Random forests, GBM, KMeans, PCA, GLM
  • hda — Heteroscedastic Discriminant Analysis
  • ipred — Improved Predictors
  • kernlab — kernlab: Kernel-based Machine Learning Lab
  • klaR — Classification and visualization
  • kohonen — Supervised and Unsupervised Self-Organising Maps.
  • lars — Least Angle Regression, Lasso and Forward Stagewise
  • lasso2 — L1 constrained estimation aka ‘lasso’
  • LiblineaR — Linear Predictive Models Based On The Liblinear C/C++ Library
  • lme4 - Mixed-effects models
  • LogicReg — Logic Regression
  • maptree — Mapping, pruning, and graphing tree models
  • mboost — Model-Based Boosting
  • Machine Learning For Hackers
  • mlr - Extensible framework for classification, regression, survival analysis and clustering
  • mvpart — Multivariate partitioning
  • MXNet - MXNet brings flexible and efficient GPU computing and state-of-art deep learning to R.
  • ncvreg — Regularization paths for SCAD- and MCP-penalized regression models
  • nnet — eed-forward Neural Networks and Multinomial Log-Linear Models
  • oblique.tree — Oblique Trees for Classification Data
  • pamr — Pam: prediction analysis for microarrays
  • party — A Laboratory for Recursive Partytioning
  • partykit — A Toolkit for Recursive Partytioning
  • penalized — L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model
  • penalizedLDA — Penalized classification using Fisher’s linear discriminant
  • penalizedSVM — Feature Selection SVM using penalty functions
  • quantregForest — quantregForest: Quantile Regression Forests
  • randomForest — randomForest: Breiman and Cutler’s random forests for classification and regression.
  • randomForestSRC — randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC).
  • ranger — A Fast Implementation of Random Forests.
  • rattle — Graphical user interface for data mining in R.
  • rda — Shrunken Centroids Regularized Discriminant Analysis
  • rdetools — Relevant Dimension Estimation (RDE) in Feature Spaces
  • REEMtree — Regression Trees with Random Effects for Longitudinal (Panel) Data
  • relaxo — Relaxed Lasso
  • rgenoud — R version of GENetic Optimization Using Derivatives
  • rgp — R genetic programming framework
  • Rmalschains — Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R
  • rminer — Simpler use of data mining methods (e.g. NN and SVM) in classification and regression
  • ROCR — Visualizing the performance of scoring classifiers
  • RoughSets — Data Analysis Using Rough Set and Fuzzy Rough Set Theories
  • rpart — Recursive Partitioning and Regression Trees
  • RPMM — Recursively Partitioned Mixture Model
  • RSNNS — Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS)
  • Rsomoclu — Parallel implementation of self-organizing maps.
  • RWeka — R/Weka interface
  • RXshrink — RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression
  • sda — Shrinkage Discriminant Analysis and CAT Score Variable Selection
  • SDDA — Stepwise Diagonal Discriminant Analysis
  • SuperLearner and subsemble — Multi-algorithm ensemble learning packages.
  • svmpath — svmpath: the SVM Path algorithm
  • tgp — Bayesian treed Gaussian process models
  • tree — Classification and regression trees
  • varSelRF — Variable selection using random forests
  • xgboost- eXtreme Gradient Boosting Tree model, well known for its speed and performance.
  • AnomalyDetection - AnomalyDetection R package from Twitter.
  • ahaz — Regularization for semiparametric additive hazards regression.
  • arules — Mining Association Rules and Frequent Itemsets
  • bigrf — Big Random Forests: Classification and Regression Forests for Large Data Sets
  • bigRR — Generalized Ridge Regression (with special advantage for p >> n cases)
  • bmrm — Bundle Methods for Regularized Risk Minimization Package
  • Boruta — A wrapper algorithm for all-relevant feature selection
  • BreakoutDetection - Breakout Detection via Robust E-Statistics from Twitter.
  • bst — Gradient Boosting
  • CausalImpact - Causal inference using Bayesian structural time-series models.
  • C50 — C5.0 Decision Trees and Rule-Based Models
  • caret - Classification and Regression Training
  • Clever Algorithms For Machine Learning
  • CORElearn — Classification, regression, feature evaluation and ordinal evaluation
  • CoxBoost — Cox models by likelihood based boosting for a single survival endpoint or competing risks
  • Cubist — Rule- and Instance-Based Regression Modeling
  • e1071 — Misc Functions of the Department of Statistics (e1071), TU Wien
  • earth — Multivariate Adaptive Regression Spline Models
  • elasticnet — Elastic-Net for Sparse Estimation and Sparse PCA
  • ElemStatLearn — Data sets, functions and examples from the book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman
  • evtree — Evolutionary Learning of Globally Optimal Trees
  • forecast — Timeseries forecasting using ARIMA, ETS, STLM, TBATS, and neural network models
  • forecastHybrid — Automatic ensemble and cross validation of ARIMA, ETS, STLM, TBATS, and neural network models from the “forecast” package
  • prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
  • FSelector — A feature selection framework, based on subset-search or feature ranking approches.
  • frbs — Fuzzy Rule-based Systems for Classification and Regression Tasks
  • GAMBoost — Generalized linear and additive models by likelihood based boosting
  • gamboostLSS — Boosting Methods for GAMLSS
  • gbm — Generalized Boosted Regression Models
  • glmnet - Lasso and elastic-net regularized generalized linear models
  • glmpath — L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model
  • GMMBoost — Likelihood-based Boosting for Generalized mixed models
  • grplasso — Fitting user specified models with Group Lasso penalty
  • grpreg — Regularization paths for regression models with grouped covariates
  • h2o - Deeplearning, Random forests, GBM, KMeans, PCA, GLM
  • hda — Heteroscedastic Discriminant Analysis
  • ipred — Improved Predictors
  • kernlab — kernlab: Kernel-based Machine Learning Lab
  • klaR — Classification and visualization
  • kohonen — Supervised and Unsupervised Self-Organising Maps.
  • lars — Least Angle Regression, Lasso and Forward Stagewise
  • lasso2 — L1 constrained estimation aka ‘lasso’
  • LiblineaR — Linear Predictive Models Based On The Liblinear C/C++ Library
  • lme4 - Mixed-effects models
  • LogicReg — Logic Regression
  • maptree — Mapping, pruning, and graphing tree models
  • mboost — Model-Based Boosting
  • Machine Learning For Hackers
  • mlr - Extensible framework for classification, regression, survival analysis and clustering
  • mvpart — Multivariate partitioning
  • MXNet - MXNet brings flexible and efficient GPU computing and state-of-art deep learning to R.
  • ncvreg — Regularization paths for SCAD- and MCP-penalized regression models
  • nnet — eed-forward Neural Networks and Multinomial Log-Linear Models
  • oblique.tree — Oblique Trees for Classification Data
  • pamr — Pam: prediction analysis for microarrays
  • party — A Laboratory for Recursive Partytioning
  • partykit — A Toolkit for Recursive Partytioning
  • penalized — L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model
  • penalizedLDA — Penalized classification using Fisher’s linear discriminant
  • penalizedSVM — Feature Selection SVM using penalty functions
  • quantregForest — quantregForest: Quantile Regression Forests
  • randomForest — randomForest: Breiman and Cutler’s random forests for classification and regression.
  • randomForestSRC — randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC).
  • ranger — A Fast Implementation of Random Forests.
  • rattle — Graphical user interface for data mining in R.
  • rda — Shrunken Centroids Regularized Discriminant Analysis
  • rdetools — Relevant Dimension Estimation (RDE) in Feature Spaces
  • REEMtree — Regression Trees with Random Effects for Longitudinal (Panel) Data
  • relaxo — Relaxed Lasso
  • rgenoud — R version of GENetic Optimization Using Derivatives
  • rgp — R genetic programming framework
  • Rmalschains — Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R
  • rminer — Simpler use of data mining methods (e.g. NN and SVM) in classification and regression
  • ROCR — Visualizing the performance of scoring classifiers
  • RoughSets — Data Analysis Using Rough Set and Fuzzy Rough Set Theories
  • rpart — Recursive Partitioning and Regression Trees
  • RPMM — Recursively Partitioned Mixture Model
  • RSNNS — Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS)
  • Rsomoclu — Parallel implementation of self-organizing maps.
  • RWeka — R/Weka interface
  • RXshrink — RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression
  • sda — Shrinkage Discriminant Analysis and CAT Score Variable Selection
  • SDDA — Stepwise Diagonal Discriminant Analysis
  • SuperLearner and subsemble — Multi-algorithm ensemble learning packages.
  • svmpath — svmpath: the SVM Path algorithm
  • tgp — Bayesian treed Gaussian process models
  • tree — Classification and regression trees
  • varSelRF — Variable selection using random forests
  • xgboost - eXtreme Gradient Boosting Tree model, well known for its speed and performance.

Thank you for reading. A big thank you to https://github.com/qinwf/awesome-R#2018

--

--