Classification predicts a categorical response (true or false, red or green, spam or not spam, big/medium/small, etc), while regression predicts a continuous response ie a number (height, weight, duration).
Scikit-Learn offers a handy module for saving and loading trained models:
from sklearn.externals import joblib
To save model:
joblib.dump(model, "model.pkl")
Let’s say you have a training data set as pandas data frame train_set. Before feeding it to a machine learning algorithm, you need to split it into features and labels (aka answers).
To measure regression model performance, we use RMSE which stands for Root Mean Squared Error. You can do this easily with Scikit-Learn, but at first, lets train our model:
Building a Matplotlib graph from your Pandas DataFrame is as easy as calling plot() method on your data:
data.plot(kind=”bar”, x=”years”, y=”income”)
Simply put regression is predicting values (price, speed, etc) and classification is predicting classes (category, type, yes/no, true / false, color, etc).
One of the most confusing to beginners things in ML is its terminology. Every ML concept has several synonyms. What’s makes things even worse, each ML course or book uses different combination of these. There’s no common standard.
I’ve heard enough stories about machines coming and taking our jobs. And I’m not a big fun of such perspective. So without thinking long, I decided to embark on the journey of machine learning in order to be…
When we split a dataset into test and train sets, we often use different tactics for splitting. Very often methods are based on random records picking and putting them in different sets.
Numpy/Scipy: numpy.orgScikit-Learn: scikit-learn.orgKeras: keras.ioTensorFlow: tensorflow.orgTheano: deeplearning.net/software/theanoPandas: pandas.pydata.orgCaffe/Caffe 2: caffe.berkeleyvision.orgJupyter: jupyter.orgCNTK: microsoft.com/en-us/cognitive-toolkit…