Batch Reinforcement Learning from Crowds

This study investigates how to learn a reward function from possibly unreliable trajectory preferences collected from crowdworkers.

Overview

A three-minute overview for this study is provided below. You are also welcome to read the poster and the paper on arXiv.