Ingredients:
- \(\mathcal{P}\) problem (states, transitions, etc.)
- \(d\) depth (how many next states to look into)—more is more accurate but slower
- \(U\) value function estimate at depth \(d\)
We essentially roll forward into all possible next states up to depth \(d\), and tabulate our value function.
Define subroutine forward_search(depth_remaining, value_function_estimate_at_d, state)
.
- if
depth_remaining=0
; return(action=None, utility=value_function_estimate_at_d(state))
- otherwise,
- let
best = (action = None, utility = -infinity)
- for each possible action at our state
- get an action-value for our current state where the utility of each next state is the utility given by
forward_search(depth_remaining-1, value_function_estimate_at_d, next_state)
- if the action-value is higher than what we have, then we set
best=(a, action-value)
- get an action-value for our current state where the utility of each next state is the utility given by
- return
best
- let
What this essentially does is to Dijkstra an optimal path towards the highest final utility \(U(s)\) om your current state, by trying all states.