Machine Learning: C++ Logistic Regression Example

Russsun
3 min readMar 30, 2019

--

I previously showed the C++ linear regression implementation example, which can be taken as a smoke test of understanding the basics of machine learning during an interview. A logistic regression question in a machine learning related interview can be considered as a bar raiser, because its complexity in the cost function and calculation process of the derivatives of the cost function.

Also understanding logistic regression is an essential step to understand how to solve regression problem in deep learning. If you can do all the calculus and implement this algorithm on a white board or a piece of paper, you are able to implement a tiny neural network training framework with back-propagation. This is because the nature of back-propagation of the activation functions like Sigmoid or Tanh is very similar to logistic regression.

As we mentioned before, linear regression is good for prediction problems. Logistic regression, is good for classification problems. In this example, we do only binary classification.

Problem: given any input, classify it into label 0 or label 1.

Logistic Regression for Binary Classification

In order to fit the nature of the binary classification training data, we design this S curve as a probability equation:

p = 1/(1+exp(-(ax+b)))

So p’s value range is always from 0 to 1.

We also need to design a Cost Function, which calculates the error between the prediction value y_hat and label value y.

Cost Function:

When y = 1, C =-log(y_hat)

When y = 0, C =-log(1-y_hat)

Then we need to calculate the partial derivatives of C on a and b.

When y = 1,

  • dC/da = -x/(1+exp(ax+b))
  • dC/db = -1/(1+exp(ax+b))

When y = 0,

  • dC/da = x*exp(ax+b)/(1+exp(ax+b))
  • dC/db = exp(ax+b)/(1+exp(ax+b))

You need to really try to calculate these partial derivatives carefully step by step. It took me half an hour to figure them out correctly on paper. I also asked a very experienced machine learning engineer, who didn’t get through it.

The example implementation is in C++. I tried to find a similar simple logistic regression example using C++ but failed to find it.

This example could be the simplest C++ logistic regression with 50 lines of code, which is really easy to understand. It serves the purpose of learning or getting through a white-board coding question.

Note that, for simplicity, I didn’t add regularization to it.

double getElementLogisticCost(double &x, int &y, double &a, double &b)

{

double p = 1/(1+ exp(-(a*x+b)));

if(y==1)

{

return -log(p);

}

else

{

return -log(1-p);

}

}

// slope:a

// intercept:b

// derivative of slope: da

// derivative of intercept: db

double getLogisticCost(vector<double> &x, vector<int> &y, double a, double b, double &da, double &db)

{

int n = static_cast<int>(x.size());

double cost=0;

da = 0;

db = 0;

for(int i = 0; i<n;i++)

{

cost += getElementLogisticCost(x[i], y[i], a, b);

double eaxb = exp(a*x[i]+b);

if(y[i]==1)

{

da += -x[i]/(1+eaxb);

db += -1/(1+eaxb);

}

else

{

da += x[i]*eaxb/(1+eaxb);

db += eaxb/(1+eaxb);

}

}

cost /= n;

da /= n;

db /=n;

return cost;

}

void logisticRegression(vector<double> &x, vector<int> &y, double slope = 1, double intercept = 0)

{

double lrate = 0.0005;

double threshold = 0.001;

int iter = 0;

while(true)

{

double da = 0;

double db = 0;

double cost = getLogisticCost(x, y, slope, intercept, da, db);

if(iter%1000==0)

{

cout<<”Iter: “<<iter<< “ cost = “<<cost<< “ da = “ << da << “ db = “<<db<< endl;

}

iter++;

if(abs(da) < threshold && abs(db) < threshold)

{

cout<<”p = 1/(1+exp(-(“<<slope<<” * x + “<<intercept<<”)))”<<endl;

break;

}

slope -= lrate* da;

intercept -= lrate * db;

}

}

int main() {

vector<double> A;

vector<int> B;

// create a dataset with inputs and labels

// for values [0, 20), assign label 0

// for values [80,100) assign label 1

for(int i =0;i<1000;i++)

{

A.push_back(rand()%20);

B.push_back(0);

A.push_back(80+rand()%20);

B.push_back(1);

}

// kick off our simple logisticRegression!

logisticRegression(A, B);

return 0;

}

Terminal Outputs:

Iter: 270000 cost = 0.0143038 da = -0.000199134 db = 0.0102486

Iter: 271000 cost = 0.0142515 da = -0.000198411 db = 0.0102109

Iter: 272000 cost = 0.0141995 da = -0.000197693 db = 0.0101734

Iter: 273000 cost = 0.0141479 da = -0.00019698 db = 0.0101362

Iter: 274000 cost = 0.0140967 da = -0.000196272 db = 0.0100992

Iter: 275000 cost = 0.0140459 da = -0.000195569 db = 0.0100626

Iter: 276000 cost = 0.0139954 da = -0.000194872 db = 0.0100261

p = 1/(1+exp(-(0.122318 * x + -5.1481)))

--

--