LaMo 2023

Can we unleash the power of pre-trained LMs to solve
sequential decision-making problems?

Overview

Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces Language Models for Motion Control (LaMo), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate LaMo achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples. Below we show the average performance over different data ratios. (Medium for MuJoCo and Atari, Complete and Partial for Kitchen.)

Highlights

We propose LaMo, an offline RL framework that leverages the pre-trained Language Models (LMs) for low-level Motion control. On sparse-reward tasks, LaMo achieves strong results and surpasses recent strong algorithms CQL, IQL, TD3+BC, and DT; On dense-reward tasks, LaMo significantly improves Decision Transformer and closes the gap between value-based methods and DT-based methods. Notably, in low-data scenarios, our method demonstrates powerful few-shot learning ability, which can be attributed to the inductive bias from pre-trained LMs.

We look into the relationship between the performance of various algorithms and the scale of data. As depicted in the Figure, LaMo is capable of achieving excellent performance even with relatively small datasets. For example, in Hopper, LaMo surpasses the performance of CQL and DT when the sample ratio of data is 0.5% and maintains this advantage consistently as the sample ratio increases.

Below, we visualize 8 tasks across 3 domains that we consider.

Hopper

Walker2d

Halfcheetah

Reacher

Breakout

Qbert

Pong

Kitchen

Method

LaMo encompasses several crucial designs:

We adopt a pre-trained LM (i.e. GPT-2) as the initialization of a Decision Transformer (DT);
We replace the linear embedding projections with MLPs to augment representation learning capabilities for complicated tasks;
During training the offline RL agents, we freeze the pre-trained parts and utilize the parameter-efficient fine-tuning technique LoRA, where the trainable parameters account for only 0.7% of the entire model;
We introduce language prediction as an auxiliary objective while finetuning, in order to stabilize the performance and maintain the language ability.

Citation

@inproceedings{ shi2024LaMo, title={Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning}, author={Ruizhe Shi and Yuyao Liu and Yanjie Ze and Simon Shaolei Du and Huazhe Xu}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=AY6aM13gGF} }

Uneashing the Power of Pre-trained Language Models
for Offline Reinforcement Learning

Ruizhe Shi^1* Yuyao Liu^1* Yanjie Ze² Simon S. Du³ Huazhe Xu¹²⁴
¹Tsinghua University, IIIS ²Shanghai Qi Zhi Institute ³University of Washington ⁴Shanghai AI Lab
^**Equal contribution. Order is decided by coin flip.*

Accepted by ICLR 2024.

Overview

Highlights

Method

Citation

Uneashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Ruizhe Shi1* Yuyao Liu1* Yanjie Ze2 Simon S. Du3 Huazhe Xu124 1Tsinghua University, IIIS 2Shanghai Qi Zhi Institute 3University of Washington 4Shanghai AI Lab *Equal contribution. Order is decided by coin flip.

Accepted by ICLR 2024.

Overview

Highlights

Method

Citation

Uneashing the Power of Pre-trained Language Models
for Offline Reinforcement Learning

Ruizhe Shi^1* Yuyao Liu^1* Yanjie Ze² Simon S. Du³ Huazhe Xu¹²⁴
¹Tsinghua University, IIIS ²Shanghai Qi Zhi Institute ³University of Washington ⁴Shanghai AI Lab
^**Equal contribution. Order is decided by coin flip.*