Understanding Artificial Neural Network With Linear Regression

Artificial Neural Network (ANN) is probably the first stop for anyone who enters into the field of Deep Learning. Inspired by the structure of Natural Neural Network present in our body, ANN mimics a similar structure and learning mechanism.

ANN is just an algorithm to build an efficient predictive model. Because the algorithm and so its implementation resembles a typical neural network, it is named so. The functionality of ANN can be explained in below 5 simple steps:

Read the input data
Produce the predictive model (A mathematical function)
Measure the error in the predictive model
Inform and implement necessary corrections to the model repeatedly until a model with least error is found
Use this model for predicting the unknown

A beginner in data science, after going through the concepts of Regression, Classification, Feature Engineering etc. and enters into the field of deep learning, it would be very beneficial if one can relate the functionality of algorithms in deep learning with above concepts.

Before understanding ANN, let us understand a perceptron, which is a basic building block of ANN. Perceptron is the name initially given to a binary classifier. However, we can view the perceptron as a function which takes certain inputs and produces a linear equation which is nothing but a straight line. This can be used to separate certain easily separable data as shown in the figure. However, remember that in real-world scenarios, classes will not be so easily separable.

The structure of a perceptron can be visualised as below:

A typical neural network with multiple perceptrons in it looks like below:

This means generating multiple linear equations at multiple points. These perceptrons can also be called as neurons or nodes which are actually the basic building blocks in natural neural network within our body. In the above figure, the first vertical set of 3 neurons is the input layer. The next two vertical sets of neurons are part of the middle layer which are usually referred to as hidden layers, and the last single neuron is the output layer. The neural network in the above figure is a 3-layered network. This is because the input layer is generally not counted as part of network layers. Each neuron in the input layer represents an attribute (column) in the input data (i.e., x1, x2, x3 etc.). What is happening in the above network is that input data is fed to set of neurons, and each produces an output. Again, each of these outputs are fed to other neurons which in turn produces another output, which is again fed to the output layer. Error calculated at this output layer is again sent back in the network to further refine the outputs of each neuron which are again fed to the neuron in output layer to produce a refined output than before. As explained in the 5-step process above, this process is repeated until we get an output with minimal error.

The process of producing outputs, calculating errors, feeding them back again to produce a better output is generally a confusing process, especially for a beginner to visualise and understand. Hence, an effort is made here to explain this process with just one neuron and one layer. Once this basic concept is understood, expanding this to a larger neural network is not difficult.

Everyone agrees that simple linear regression is the simplest thing in machine learning or atleast the first thing that anyone learns in machine learning. So, we will try to understand this concept of deep learning also with a simple linear regression, by solving a regression problem using ANN.

Implementing ANN for Linear Regression

We have understood from the above that each of the neuron in the ANN except the input layer produces an output. The output is based on what function that we use. This function is generally referred as ‘Activation Function’. As ANN is mainly used for classification purposes, generally sigmoid function or other similar classification algorithms are used as activation functions. But, as we are now trying to solve a linear regression problem, our activation function here is nothing but a ‘Simple Linear Equation’ of the form –

y=w₀ + w₁x₁+ w₂x₂ + w₃x₃ + …. w_nx_n

where x₁, x₂, x₃.. x_n are the independent attributes in the input data,

w₁, w₃… w_nare the weights (Co-efficients) to corresponding attributes, and

w₀ is the bias

Because our output should just be a single linear line, we should configure our ANN with just 1 neuron. As the output of this 1 neuron itself is the linear line, this neuron will be placed in the output layer. Hidden layers are required when we try to classify objects with using multiple lines (or curves). So, we don’t need any hidden layers as well here.

Hence the ANN to solve a linear regression problem consists of an input layer with all the input attributes and an output layer with just 1 neuron as shown below:

Now, we have finalised the structure of our ANN. Our next task is to actually write code to implement it. We will be implementing this simple ANN from scratch as that will help to understand lot of underlying concepts in already available ANN libraries.

Recall the 5 steps that are mentioned at the beginning. As mentioned there, the process involves feeding input to a neuron in the next layer to produce an output using an activation function. This process is called as ‘Feed Forward’. After producing the output, error (or loss) is calculated and a correction is sent back in the network. This process is called as ‘Back Propagation’. We will also use some standard terminologies for our ANN network such as ‘Network’, ‘Topology’ etc. which we will see in the code. With various terms and terminologies that we have learnt so far, let us implement the code –

1. Import the required libraries

2. Initialise the weights and other variables

In our approach, we will be providing input to the code as a list such as [2,3,1]. Here, the total no. of values present in the list (list size) indicate the number of layers that we want to configure, and each number in the list indicate the no. of neurons inside each layer. So, the list [2,3,1] indicates our network should consists of 3 layers in which first layer consists of 2 neurons, second layer consists of 3 neurons and output layer consists of 1 neuron. This structure can be called as ‘network topology’. However, as we are solving regression problem, we just need 1 neuron at the output layer as discussed above. So, we just need to pass the input list as [1].

In our approach to build a Linear Regression Neural Network, we will be using Stochastic Gradient Descent (SGD) as an algorithm because this is the algorithm used mostly even for classification problems with a deep neural network (means multiple layers and multiple neurons). I will assume the reader is already aware of this algorithm and proceed with its implementation.

We will initialise all the weights to zeros. Let us create a class called ‘Network’ and initialise all required variable in the constructor as below –

‘self.output’ variable in the above code is to hold the outputs of each neuron. It will be initialised accordingly with a sufficient sized list based on our input. Remaining variables are pretty self-explanatory.

3. Coding ‘fit’ function

We know that the gradient descent algorithm requires ‘learning rate’ (eta) and no. of iterations (epoch) as inputs. We will be passing all these values in a list to the program along with the training data. Let us build a ‘fit’ method to construct a predictive model with all the inputs given –

4. Produce the Output and Correct the Error

I have mentioned above what ‘feed forward’ and ‘back propagation’ are. Let us implement those methods –

Above function is just forming a simple linear equation of y = mx + c kind and nothing more.

In SGD algorithm, we continuously update the initialised weights in the negative direction of the slope to reach the minimal point.

Error function E(w) = ∑[(w₀+ w₁x₁ – y₁)² +(w₀+ w₁x₂ – y₂)²+….. +(w₀+ w₁x_n – y_n)²]

Here, I have not taken ½ as scaling factor to the equation. One may take if desired so. Also, in SGD only one row is passed to the above error function every time to calculate the error. Hence, if we differentiate the above equation w.r.t. each of the weights w₀,w₁, w₂ .. etc., we get equations like

∂E0=1,

∂E1= 2*(w₀+ w₁x₁ – y₁)*x₁

After calculating the slope w.r.t. each of the weights, we will be updating the weights with new values in the negative direction of the slope as below –

W_n = w_n – * ∂Ewn where is the learning rate.

Let us implement all this logic in the back propagate function as below:

In order to visualise the error at each step, let us quickly write functions to calculate Mean Squared Error (for full dataset) and Squared Error (for each row) which will be called for each step in an epoch.

Having the model built in the above way, let us define a method which takes some input and predicts the output –

That’s it. We have built a simple neural network which builds a model for linear regression and also predicts values for unknowns.

5. Executing the program

In order to pass inputs and test the results, we need to write few lines of code as below –

In above code, a sample dataset of 10 rows is passed as input. Full code can be accessed and executed at Google Colab :

https://colab.research.google.com/drive/1f84s4nlKSas5LGpR8zdRxWOsKL5HIoyy

Sample outputs for given inputs are as below:

The plot below shows how the error is getting reduced in each step as weights get continuously updated and again fed into the system.

So, we have understood how in few lines of code we can build a simple neural network. The same code can be extended to handle multiple layers with various activation functions so that it just works like a full-fledged ANN. I will implement that in my next article.