Past Work
- self play: this is a \(\text{coNP}\) vs \(\text{NP}\) problem: whereas competitive self-play attempts to defend against all strategies, collaborative self-play only needs to find one useful strategy; this doesn’t generalize well because humans are not a partner
- behavior cloning:
- Population Based Training: computational super e
Novelty
- instead, learn a generative model from both simulated agents or human data
- then, sample from this generative model