base epsilon-greedy:
- choose a random action with probability \(\epsilon\)
- otherwise, we choose the action with the best expectation \(\arg\max_{a} Q(s,a)\)
epsilon-greedy exploration with decay
Sometimes, approaches are suggested to decay \(\epsilon\) whereby, at each timestamp:
\begin{equation} \epsilon \leftarrow \alpha \epsilon \end{equation}
whereby \(\alpha \in (0,1)\) is called the “decay factor.”
Explore-then-commit
Select actions uniformly at random for \(k\) steps; then, go to greedy and stay there