Interpretable Preference-based Reinforcement Learning via Temporal Lasso

This study investigates how to identify states that are critical for trajectory preferences.