AI is in a local max and the brain can help

Published 2022

This is just a document based on brain research and potential inefficiencies in the direction the field is heading. Deep learning is inspired by and in many ways similar to the way the brain works, but I can’t help but wonder if it’s a local max. What if the path the field went on was more neurologically inspired? What if there are more crucial insights we can gain from the brain?

One note is that based on current progress, it is clear that models scale with more computing power. And while they are also becoming more efficient improving how powerful a model can get with less parameters there are issues that one would be naïve to ignore

The brain can recognize something in less than 0.5 milliseconds or a path of 100 total neurons. Processors run billions of operations per second and still reach a suboptimal result for easy image recognition tasks to humans (you could argue the medium of ‘computation’ itself is not optimized for intelligence - biological vs artificial)
Moore’s law doesn’t seem to be holding up in the 21st century. There are alternatives such as neuromorphic or quantum computing but they are quite early in their development. Relevant blog: https://openai.com/blog/ai-and-compute/
The deeper the network not only does it need more compute, it also needs more data. Solutions: data augmentation, transfer learning, unsupervised pre-training. Our brains can learn patterns much quicker (great example: https://julien-vitay.net/lecturenotes-neurocomputing/4-neurocomputing/1-Limits.html#lack-of-abstraction). Data isn’t infinite

Jeff Hawkins (Thousand Brain Theory), Ray Kurzweil (How To Create A Brain) are good brain-AI books out there.

The backprop problem

How does our brain update gradients so efficiently? Backdrop doesn’t exist physically in the brain (Jeffrey Hinton), only forward prop exists in the brain physically
Sleep is the consolidation in which long term consolidation is performed, but not physically backprop.
- Sleep is the backpropagation to convert short term memory into long term memory. REM sleep is when it happens, instincts are activated and prefrontal is dormant. Hippocampi do this long term memory backprop (H.M case Brain That Changes Itself & Man Who Mistook)
- Sleep also creates invariance in our data to not overfit (dreaming)
Short term learning we instantly adjust our gradients given new input so it's not just that.
Speech is a double forward pass. Communication process prunes away and creates new links

[R] The Forward-Forward Algorithm: Some Preliminary Investigations [Geoffrey Hinton]

Does the brain do backpropagation? CAN Public Lecture - Geoffrey Hinton - May 21, 2019

The credit assignment problem is knowing which part of the brain is responsible for a specific function. By computing the gradient of the error with respect to the weights at each layer, backpropagation allows us to update the weights in a way that reduces the overall error of the network. In other words, backpropagation is a method that helps to identify which parts of the network are contributing to the error, so that we can adjust the weights in that part of the network to improve the overall performance of the model.
- However, this process is not biologically plausible because it relies on transmitting error gradients backwards through the weights of the connections, which would not be possible in the brain. In the brain, information flows in one direction only, from the presynaptic neuron to the postsynaptic neuron. A synapse does not know the weight of other synapses, and therefore cannot transmit anything backwards.
- Alternatives
  1. Feedback Alignment: Instead of transmitting error gradients backwards, feedback alignment transmits random feedback signals to the neurons. This helps to break the symmetry of the weight matrix and allows the network to learn. https://julien-vitay.net/lecturenotes-neurocomputing/4-neurocomputing/1-Limits.html#feedback-alignment
  2. Unsupervised Learning: Unsupervised learning methods, such as autoencoders, do not require explicit error gradients and can learn useful representations of the data without the need for labeled data.
  3. Evolutionary Algorithms: Evolutionary algorithms, such as genetic algorithms, can be used to optimize the weights of a neural network. The algorithm starts with a population of randomly generated networks and iteratively evolves them through selection and mutation.
  4. Plasticity-based Learning Rules: Some researchers are exploring biologically plausible learning rules based on neural plasticity. These rules are inspired by the way neurons change their connections in response to activity.
  5. Direct Feedback Alignment (DFA): The idea of DFA is to not use the error of the output but the error of an intermediate layer as feedback for the weight updates.

Remembering and memory

The brain also has a near perfect memory but forgets on purpose for certain things? (epilepsy experiments - people remembering songs perfectly from temporal seizures The Man Who Mistook His Wife For A Hat)
Memory is stored in patterns of the brain of neuron firings - current dl models work like this. But the actual mechanism for learning is wrong. It takes many neurons in a dl model vs only a few hundred for the brain to recognize the same thing (On Intelligence)

Massive Language Models Can Be Accurately Pruned in One-Shot
Invariance in the brain is significantly better and quicker (On Intelligence). Why is this the case?
- Lack of invariance leads to adversarial attacks: https://julien-vitay.net/lecturenotes-neurocomputing/4-neurocomputing/1-Limits.html#adversarial-attacks
- CNNs can’t generalize different viewpoints: https://julien-vitay.net/lecturenotes-neurocomputing/4-neurocomputing/1-Limits.html#no-real-generalization
- Lack of invariance also means that anything non-deterministic is hard. Even if it has the same inputs as humans https://julien-vitay.net/lecturenotes-neurocomputing/4-neurocomputing/1-Limits.html#game-fallacy