The Primacy Bias in Deep Reinforcement Learning
Cognitive science has shown that what humans learn first gives them a bias.
They found that the outcome of the first experience has a substantial and lasting effect on participants’ subsequent behaviour, which they term as “outcome primacy”.
This cognitive bias could lead to behaviors that are suboptimal and difficult to unlearn or improve upon when presented with further knowledge.
Deep reinforcement learning (RL) agents are susceptible to similar issues during their training as a tendency to overfit early experiences that damage the rest of the learning process.
As a remedy for the primacy bias, we propose a resetting mechanism allowing the agent to forget a part of its knowledge.
periodically re-initialize the last layers of an agent’s neural networks, while maintaining the experience within the buffer.
Full paper: https://arxiv.org/pdf/2205.07802.pdf