Skip to content

Commit

Permalink
Update 2024-05-07-primacy-bias-and-why-it-helps-to-forget.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mkielo3 committed Dec 19, 2023
1 parent 1b8447c commit f32c6eb
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions _posts/2024-05-07-primacy-bias-and-why-it-helps-to-forget.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ toc:
- name: Introduction to Primacy Bias
- name: Off Policy Deep Reinforcement Learning
subsections:
- name: Why overcomplicate things?
- name: Are we Overcomplicating?
- name: Selecting a Replay Ratio
subsections:
- name: Heavy Priming
Expand Down Expand Up @@ -85,7 +85,7 @@ Nikishin et al. discuss a specific type of model that is particularly sensitive

In human terms, step 1 is the algorithm living its day-to-day life. At the end of the day, it goes to sleep, and overnight the algorithm's lifetime of experiences are referenced to update its beliefs (step 2).

### Why overcomplicate things?
### Are we Overcomplicating?
For those without a reinforcement learning background, this might seem needlessly complicated. Why can’t we simply explore with a random policy and then fit a model all at once?

Althought this is sometimes done [5], the quality of the memories in the replay buffer is proportionate to the quality of the policy that gathered the experience. Consider an agent learning to play chess. A random policy might have enough data to learn how to play the start of the game effectively, but it will never learn how to chase an opponent’s king around an empty board. If a policy isn’t smart enough to get the agent out of the ‘early' game, it will never collect experiences to learn the ‘mid’ or ‘late' games.
Expand Down

0 comments on commit f32c6eb

Please sign in to comment.