Forward Search

Ingredients:

\(\mathcal{P}\) problem (states, transitions, etc.)
\(d\) depth (how many next states to look into)—more is more accurate but slower
\(U\) value function estimate at depth \(d\)

We essentially roll forward into all possible next states up to depth \(d\), and tabulate our value function.

Define subroutine forward_search(depth_remaining, value_function_estimate_at_d, state).

if depth_remaining=0; return (action=None, utility=value_function_estimate_at_d(state))
otherwise,
1. let best = (action = None, utility = -infinity)
2. for each possible action at our state
  1. get an action-value for our current state where the utility of each next state is the utility given by forward_search(depth_remaining-1, value_function_estimate_at_d, next_state)
  2. if the action-value is higher than what we have, then we set best=(a, action-value)
3. return best

What this essentially does is to Dijkstra an optimal path towards the highest final utility \(U(s)\) om your current state, by trying all states.