Policy Optimization deals with algorithms that, unlike value iteration/policy iteration/online planning which uses a surrogate (like value function or some future discounted reward) to calculate a policy, directly optimizes against policy parameters \(\theta\) for a policy \(\pi_{\theta}\).