Key SequenceNotationNew ConceptsRestricted Gradient and Actor-CriticExploration and Exploitation with Binary BanditUndirected Exploration (Explore-then-commit)Directed Exploration (Softmax Method, Quantile Exploration, UCB 1, Posterior Sampling)Important Results / ClaimsBayesian Model Estimation and greedy actionepsilon-greedy exploration with decayQuestionsInteresting Factoids