NeurIPS

Multi-Agent Reinforcement Learning is A Sequence Modeling Problem

Download Abstract: Large sequence model (SM) such as GPT series and BERT has displayed outstanding performance and generalization capabilities on vision, language, and recently reinforcement learning tasks. A natural follow-up question is how to abstract multi-agent decision making into an SM problem and benefit from the prosperous development of SMs. In this paper, we introduce …

Multi-Agent Reinforcement Learning is A Sequence Modeling Problem Read More »

Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning 

Download Abstract: Setting up a well-designed reward function has been challenging for many reinforcement learning applications. Preference-based reinforcement learning (PbRL) provides a new framework that avoids reward engineering by leveraging human preferences (i.e., preferring apples over oranges) as the reward signal. Therefore, improving the efficacy of data usage for preference data becomes critical. In this …

Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning  Read More »

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning 

Download Abstract: Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations. In this paper, we develop a unified framework that describes variations of GMRL algorithms and points out that existing stochastic meta-gradient estimators adopted by GMRL are actually textbf{biased}. …

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning  Read More »

A Unified Diversity Measure for Multiagent Reinforcement Learning

Download Abstract: Promoting behavioural diversity is of critical importance in multi-agent reinforcement learning, since it helps the agent population maintain robust performance when encountering unfamiliar opponents at test time, or, when the game is highly non-transitive in the strategy space (e.g., Rock-Paper-Scissor). While a myriad of diversity metrics have been proposed, there are no widely …

A Unified Diversity Measure for Multiagent Reinforcement Learning Read More »

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

Download Abstract: Learning new task-specific skills from a few trials is a fundamental challenge for artificial intelligence. Meta reinforcement learning (meta-RL) tackles this problem by learning transferable policies that support few-shot adaptation to unseen tasks. Despite recent advances in meta-RL, most existing methods require the access to the environmental reward function of new tasks to …

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation Read More »

Constrained Update Projection Approach to Safe Policy Optimization

Download Abstract: Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed …

Constrained Update Projection Approach to Safe Policy Optimization Read More »

Scroll to Top