stochastic gradient descent (SGD)

Hakob Avjyan
3 min readOct 22, 2018

--

with CODE & IMPLEMENTATION

The stochastic gradient method is a gradient descent method optimized by the rate of convergence. The difference between the traditional gradient method is that the elements are considered separately. Stochastic gradient descent (SGD) approximate the gradient using only one data point. So, evaluating gradient saves a lot of time compared to summing over all data. This is very useful while specifically working with big data sets. Thus, the gradient of the cost function will be calculated not for all elements in the sample, as it is done with the traditional gradient descent method, but for each element separately. The gradient calculated for a particular element is taken as an approximation of the real gradient. Weights in the model are recalculated in accordance with the calculated gradient for one element, which leads to the fact that the model is adjusted when moving from each successive element of the sample to the next.

Stochastic Gradient Descent (SGD) with Python Algorithm

  1. Shuffle dataset randomly
  2. Cycle on all elements of the sample
  3. Cycle on all weights
  4. Adjust the current weight in accordance with the private derivative of the cost function.

It is important to understand that, unlike the traditional method of gradient descent, this algorithm at each step may not strive to minimize the cost function, but as a result of a certain number of steps, the general direction will tend to this minimum.

Implementation

Import necessary libraries and the data

import pandas as pdfrom sklearn.linear_model import LogisticRegressionfrom sklearn.datasets import load_winefrom sklearn.linear_model import SGDClassifierfrom sklearn.model_selection import train_test_splitimport numpy as npimport matplotlib.pyplot as pltdataset = load_wine()

Train Test Split and Normalize

X_train, X_test, y_train, y_test = train_test_split(dataset['data'], dataset['target'], random_state=15)μ = X_train.mean(axis=0)σ = X_train.std(axis=0)X_train_normalized = (X_train - μ) / σX_test_normalized = (X_test - μ) / σ

Instantiate the model

clf = SGDClassifier(loss="log", penalty="l1", max_iter=10000)selection = pd.DataFrame(X_train_normalized, columns=dataset.feature_names)[['alcohol', 'od280/od315_of_diluted_wines']]clf.fit(selection, y_train)test_selection = pd.DataFrame(X_test_normalized, columns=dataset.feature_names)[['alcohol', 'od280/od315_of_diluted_wines']]print("Accuracy on the test set: {:.2f}".format(clf.score(test_selection, y_test)))

Accuracy on the test set is 0.87

Select values

X = test_selection.valuesY = y_test

Return coordinate matrices from coordinate vectors.

x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5h = .02xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Reshape and make predictions

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])Z = Z.reshape(xx.shape)

Construct a scatter plot

plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.brg)
plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k', cmap=plt.cm.brg)
plt.xlabel('alcohol')
plt.ylabel('od280/od315_of_diluted_wines')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.show()

Fit the model

clf = SGDClassifier(loss="log", penalty="l2", max_iter=5)clf.fit(X_train_normalized, y_train)
print("Accuracy on the test set: {:.2f}".format(clf.score(X_test_normalized, y_test)))

Accuracy on the test set is 0.96

Thank you for your attention :)

--

--