ANN Strengths and Weaknesses
ANNs are easy to construct and deal very well with large amounts of noisy data. They are especially suited to solving nonlinear problems. They work well for problems where domain experts may be unavailable or where there are no known rules. ANNs are also adaptive in nature. This makes them particularly useful in fields such as finance where the environment is potentially volatile and dynamic.
They are also very tolerant of noisy and incomplete data sets. Their robustness in storing and processing data, earned them some applications in space exploration by NASA, where fault tolerant types of equipment are required. This flexibility derives from the fact that information is duplicated many times over in the many complex and intricate network connections in ANNs, just like in the human brain. This feature of ANNs is, in contrast to the serial computer where if one piece of information is lost, the entire information set may be corrupted.
The training process of an ANN itself is relatively simple. The pre-processing of the data, however, including the data selection and representation to the ANN and the post-processing of the outputs (required for interpretation of the output and performance evaluation) require a significant amount of work. However, constructing a problem with ANNs is still perceived to be easier than modeling with conventional statistical methods. There are many statisticians who argue that ANNs are nothing more than special cases of statistical models, and thus the rigid restrictions that apply to those models must also be applied to ANNs as well. However, there are probably more successful novel applications using ANNs than conventional statistical tools. The prolific number of ANNs applications in a relatively short time could be explained by the universal appeal of the relatively easy methodology in setting up an ANN to solve a problem. The restrictions imposed by many equivalent statistical models is probably less appealing to many researchers without a strong statistical background. ANN software packages are also relatively easier to use than the typical statistical packages. Researchers can successfully use ANNs software packages without requiring full understanding of the learning algorithms. This makes them more accessible to a wider variety of researchers. ANN researchers are more likely to learn from experience rather than be guided by statistical rules in constructing a model and thus they may be implicitly aware of the statistical restrictions of their ANN models.
The major weakness of ANNs is their lack of explanation for the models that they create. Research is currently being conducted to unravel the complex network structures that are created by ANN. Even though ANNs are easy to construct, finding a good ANN structure, as well as the pre-processing and post processing of the data, is a very time consuming processes. Ripley  states ‘the design and learning for feed-forward networks are hard’. He further quoted research by Judd  and Blum and River  that showed this problem to be NP-complete.
Basic Structure of an ANN
The basic structure of an ANN consists of artificial neurons(similar to biological neurons in the human brain) that are grouped into layers. The most common ANN structure consists of an input layer, one or more hidden layers and an output layer. A modified simple model of an artificial neuron is shown in Figure 1.
In the human brain, neurons communicate by sending signals to each other through complex connections. ANNs are based on the same principle in an attempt to simulate the learning process of the human brain by using complex algorithms. Every connection has a weight attached which may have either a positive or a negative value associated with it. Positive weights activate the neuron while negative weights inhibit it. Figure 1 shows a network structure with inputs (x1, x2, …xi) being connected to neuron j with weights (w1j, w2j,…wij) on each connection. The neuron sums all the signals it receives, with each signal being multiplied by its associated weights on the connection.
This output (y) is then passed through a transfer (activation) function, g(y), that is normally non-linear to give the final output Oj. The most commonly used function is the sigmoid (logistic function) because of its easily differentiable properties, which is very convenient when the back-propagation algorithm is applied. The whole process will be discussed in more detail .
The back-propagation ANN is a feed-forward neural network structure that takes the input to the network and multiplies it by the weights on the connections between neurons or nodes; summing their products before passing it through a threshold function to produce an output. The back-propagation algorithm works by minimizing the error between the output and the target (actual) by propagating the error back into the network. The weights on each of the connections between the neurons are changed according to the size of the initial error. The input data are then fed forward again, producing a new output and error. The process is reiterated until an acceptable minimized error is obtained. Each of the neurons uses a transfer function and is fully connected to nodes on the next layer. Once the error reaches an acceptable value, the training is halted. The resulting model is a function that is an internal representation of the output in terms of the inputs at that point. A more detailed discussion of the back-propagation algorithm will be carried out in upcoming articles.
Constructing the ANN
Setting up an ANN is essentially a six step procedure.
Firstly, the data to be used need to be defined and presented to the ANN as a pattern of input data with the desired outcome or target.
Secondly, the data are categorized to be either in the training set or validation (also called test and out-of-sample) set. The ANN only uses the training set in its learning process in developing the model. The validation set is used to test the model for its predictive ability and when to stop the training of the ANN.
Thirdly, the ANN structure is defined by selecting the number of hidden layers to be constructed and the number of neurons for each hidden layer.
Fourthly, all the ANN parameters are set before starting the training process. The ANN parameters are discussed briefly in the next section and in more detail in next articles.
Next, the training process is started. The training process involves the computation of the output from the input data and the weights. The backpropagation algorithm is used to ‘train’ the ANN by adjusting its weights to minimize the difference between the current ANN output and the desired output.
Finally, an evaluation process has to be conducted to determine if the ANN has ‘learned’ to solve the task at hand. This evaluation process may involve periodically halting the training process and testing its performance until an acceptable result is obtained. When an acceptable result is obtained, the ANN is then deemed to have been trained and ready to be used.
As there are no fixed rules in determining the ANN structure or its parameter values, a large number of ANNs may have to be constructed with different structures and parameters before determining an acceptable model. The trial and error process can be tedious and the experience of the ANN user in constructing the networks is invaluable in the search for a good model.
Determining when the training process needs to be halted is of vital importance in obtaining a good model. If an ANN is overtrained, a curve-fitting problem may occur whereby the ANN starts to fit itself to the training set instead of creating a generalized model. This typically results in poor predictions of the test and validation data set. On the other hand, if the ANN is not trained for long enough, it may settle at a local minimum, rather than the global minimum solution. This typically generates a sub-optimal model. By performing periodic testing of the ANN on the test set and recording both the results of the training and test data set results, the number of iterations that produce the best model can be obtained. All that is needed is to reset the ANN and train the network up to that number of iterations.
A Brief Description of the ANN Parameters
This section gives a brief introductory non-technical description of the ANN parameters. The mathematical descriptions of the parameters and learning process will be discussed in more detail in upcoming articles.
The learning rate determines the amount of correction term that is applied to adjust the neuron weights during training. Small values of the learning rate increase learning time but tend to decrease the chance of overshooting the optimal solution. At the same time, they increase the likelihood of becoming stuck at local minima. Large values of the learning rate may train the network faster, but may result in no learning occurring at all. The adaptive learning rate varies according to the amount of error being generated. The larger the error, the smaller the values and vice-versa. Therefore, if the ANN is heading towards the optimal solution it will accelerate. Correspondingly, it will decelerate when it is heading away from the optimal solution.
The momentum value determines how much of the previous corrective term should be remembered and carried on in the current training. The larger the momentum value, the more emphasis is placed on the current correction term and the less on previous terms. It serves as a smoothing process that ‘brakes’ the learning process from heading in an undesirable direction.
Random noise is used to perturb the error surface of the neural net to jolt it out of local minima. It also helps the ANN to generalize and avoid curve fitting.
Training and Testing Tolerances
The training tolerance is the amount of accuracy that the network is required to achieve during its learning stage on the training data set. The testing tolerance is the accuracy that will determine the predictive result of the ANN on the test data set.
Determining an Evaluation Criteria
It is not always easy to determine proper evaluation criteria in designing an ANN model to solve a particular problem. In designing an ANN to solve a particular problem, special attention needs to be taken in determining the evaluation criteria. This can be done by careful analysis of the problem at hand, the main objective of the whole process and the ANN role in the process.
For example, in designing an ANN to perform the task of designing a trading system for the foreign exchange market, there are many ways to evaluate the ANN model. The most obvious is to determine the forecast accuracy in terms of forecast error. However, in this particular problem, the accuracy of the forecast is not as important as the ability of the ANN model to generate profit in the trading system. Thus the evaluation criteria in this case is the profit made in trading the out of sample data period.
In the task of designing an early warning predictor of credit unions in distress in upcoming articles, the evaluation criteria is based on the number of Type I errors committed, i.e., the number of credit unions actually in distress that were predicted to be not in distress. The ANN forecast was a number between zero and one with zero indicating no distress and 1 being in distress. However, in this case, in developing the evaluation criteria, an acceptable cut-off value has to be determined in differentiating distress from non-distress. The obvious choice is to use 0.5 but on further analysis, a value of 0.1 is determined to be a better value for this task.
We are now equipped with all the necessary details to start our journey to explore Artificial Neural Networks in more details. In upcoming articles I will try to cover each and every concept and practical implementation in as much details as possible.