Nonlinear Regression with Deep Learning

Ahmet Özlü
Analytics Vidhya
Published in
6 min readJun 28, 2020


In this post, we’ll learn training of a neural network for regression prediction using “Keras” with all of the theoretical and practical details! The approaches and codes, shared in this tutorial, can be adopted for any other regression tasks so after this tutorial, you’ll be able to solve any regression problems by a trained neural network!


Keras is the most used deep learning framework among top-5 winning teams on Kaggle!

Keras is an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear & actionable error messages. It has extensive documentation and developer guides.

In this tutorial, we’ll use Keras with TensorFlow back-end to implement a neural network for regression prediction on python!


Regression is a Machine Learning (ML) algorithm that can be trained to predict real numbered outputs; like temperature, stock price, and so on.

For Examples: Which of the following is a regression task?

  • Predicting age of a person
  • Predicting nationality of a person
  • Predicting whether stock price of a company will increase tomorrow
  • Predicting whether a document is related to sighting of UFOs?

Solution : Predicting age of a person (because it is a real value, predicting nationality is categorical, whether stock price will increase is discrete-yes/no answer, predicting whether a document is related to UFO is again discrete- a yes/no answer).

Classification vs. Regression

The main difference between regression and classification algorithms that regression algorithms are used to predict the continuous values such as price, salary, age, etc. and classification algorithms are used to predict/classify the discrete values such as male or female, true or false, spam or not spam, etc. and that’s all actually!

The figure is given at below will make your mind clearer about the regression! Yes, this figure is awesome! :)

Classification vs. Regression

In this tutorial, we’ll train a Keras neural network to predict regression for “The Yacht Hydrodynamics Data Set” case! The case contains 6 input values and an output value! We’ll try to predict regression so we’ll be able to predict the output by using 6 input values!

The Yacht Hydrodynamics Data Set

All of the details about the data set are accessible at UCI Machine Learning Repository!

Quick summary of the yacht hydrodynamics data set

Prediction of residuary resistance of sailing yachts at the initial design stage is of a great value for evaluating the ship’s performance and for estimating the required propulsive power. Essential inputs include the basic hull dimensions and the boat velocity.

Attribute Information
Variations concern hull geometry coefficients and the Froude number;
1. Longitudinal position of the center of buoyancy, adimensional.
2. Prismatic coefficient, adimensional.
3. Length-displacement ratio, adimensional.
4. Beam-draught ratio, adimensional.
5. Length-beam ratio, adimensional.
6. Froude number, adimensional.
Output: The measured variable is the residuary resistance per unit weight of displacement;
7. Residuary resistance per unit weight of displacement, adimensional.

A sample line with the header of the data set.

Designing and Developing a Keras Neural Network for Regression Prediction

Network Architecture Design

To implement a neural network for regression, it must to be defined the architecture itself. It is used a simple Multilayer Perceptron (MLP) as shown at the figure below to define the architecture.

The simple Multilayer Perceptron (MLP) to define the architecture.


Since we have designed our Keras neural network, let’s implement our design now!

1.) First, we should import the necessarily packages:

2.) Then, we need to read our training and validation data:

3.) We got our training and validation data so now let’s train our Keras neural network to predict regression. In other words, let’s implement our neural network architecture we designed just at above in “Network Architecture Design” part and perform model training!

Here are the explanations of the model implementation:

  • Since the regression is performed, a Dense layer containing a single neuron with a linear activation function. Typically ReLu-based activation are used but since it is performed regression, it is needed a linear activation (line#8).
  • The model, that is used for regression prediction, is initialized with the Adam optimizer and then it is compiled (line#11).
  • Mean absolute percentage error is used as the loss function that means it is sought to minimize the mean percentage difference between the predicted and the actual “residuary resistance per unit weight of displacement” value (line#11).
  • The choice of the number of training epochs to use is an important problem for training neural networks. Using too many epochs can cause to over-fitting of the training data set and also using too few epochs can cause under-fit model. Therefore, early stopping method, which is a method that allows you to specify an arbitrary large number of training epochs and stop training once the model performance stops improving on a hold out validation data set, is used (line#14).

Actually, please take note that there are two main typical design approaches for regression cases:

  • Using linear (the default one) as an activation function in the output layer (and ReLu in the layer before).
  • Using ’mean_squared_error’ as a loss metric.

4.) We can plot the training history to see “how did our model training go?”:

Plot of the training history

5.) Now, let’s see “how successful can be our model at predicting on the training data?”:

Plot of actual vs prediction for training set

Our model has reached 0.995 R-Squared for the predictions on training data! This is awesome!

R-Squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model.

6.) And, here is the most exciting part our study is that checking “how successful can be our model at predicting on the validation data(data that our model has not seen them never!)?”:

Plot of actual vs prediction for validation set

Our model has reached 0.992 R-Squared for the predictions on validation data! This is amazing! And, please take not that it is easier to reach ~0.99 R-Squared on training data than reaching ~0.99 R-Squared on validation data!


In this tutorial, we’ve learned about theoretical background of regression algorithms. We’ve also designed and implemented a neural network using “Keras” for nonlinear regression prediction. And, we’ve used “the yacht hydrodynamics data set” as a case study and we’ve reached 0.99 R-squared both on training and validation data and this is awesome! The approaches and codes that shared in this tutorial can be adopted for any other regression tasks such as “computer hardware”, “energy efficiency” and more! So, after this tutorial, you are able to solve any regression problems!

The source codes and more details about the implementation part are available in this GitHub repository!



Ahmet Özlü
Analytics Vidhya

I am a big fan of Real Madrid CF and I love computer science!