The use of bias and its role in driving down the loss of a Neural Network.

THE CRUX OF NEURAL NETWORKS:

If we were to describe Neural Networks simply we would simply mean that there is a flow of data through layers of neurons which are connected to each other and they sort of work together to come up with a desirable outcome.

Screenshot 2025-05-01 222933.png

        (Pic Credits: CS229, Stanford University)

The input is inserted into the first layer of network and then it traverses through all the layers providing an output. It is compared with the desired output and if they differ drastically, we tend to tweak the weights and biases to drive down or simply put decrease the loss as much as possible and get an output as similar as our output in training data.

These layers of a Neural Network are comprised of weights and added with a bias, tweaking these weights with the help of our loss is famously done with backpropogation, where I would try to delve in later blogs.

As I said these layers have weights and biases so their “functioning” can be described with a very basic equation:

$$ w^Tx+b $$

This is the equation my blog revolves around, the term “w” stands for weights, x for inputs, and b is there as bias. I had a pretty good curiosity about why is there a need for the bias term when we can simply update the weights and get our answer. (I admit it I was wrong coz I didn’t think this intuitively enough🥲)

After learning a bit from Andrej Karpathy’s NN: Zero to Hero, I built a single layer two neuron perceptron(perceptron is a really fancy name for the earliest form of neural network, if I were to describe it simply refer to the above diagram, instead of three there are just two neurons instead of three and then we directly have the output neuron skipping the middle layer also called as the hidden layer). I built a basic perceptron model which can take two inputs X and Y and give me an output Z, I wanted my model to mimic the AND Gate and the OR Gate.

So this is how the model works: I gave the model two neurons in it’s input layer, set the weights and bias to be random instead of just zero, give input in pairs (0,0), (0,1), (1,0),(1,1) and expect my output to be (0,0,0,1) for an AND Gate. So I am basically asking my program to tweak the weights and bias in such a way that they should arrive at such a configuration such that my output is that of an AND Gate.

I’d like to show what happens when the bias is set to zero/there is no bias:

I trained the model for around 1000 epoch(turns) and it still won’t drive the loss to zero(it needs to and only then we will get the exact output).

Screenshot 2025-05-01 170836.png

At epoch 999 we get some values for weights and an incorrect answer. The graph of it’s loss vs epoch can be plotted using matplotlib:

Screenshot 2025-05-01 170818.png

So we can basically observe that the loss does go down but it flatlines at 1 and does not go to zero.