Image description

Group-Relative RL Fine-Tuning for Realistic and Controllable Traffic Simulation

Keyu Chen1    Wenchao Sun1    Hao Cheng1    Sifa Zheng1   
1School of Vehicle and Mobility, Tsinghua University
arXiv 2025

we introduce a dual-stage AV-centric simulation framework that conducts imitation learning pre-training in a data-driven simulator to capture trajectory-level realism and multimodality, followed by reinforcement learning fine-tuning in a physics-based simulator to enhance controllability and mitigate covariate shift.

Image description

Abstract

Achieving both realism and controllability in closed-loop traffic simulation remains a key challenge in autonomous driving. Dataset-based methods reproduce realistic trajectories but suffer from covariate shift in closed-loop deployment, compounded by simplified dynamics models that further reduce reliability. Conversely, physics-based simulation methods enhance reliable and controllable closed-loop interactions but often lack expert demonstrations, compromising realism. To address these challenges, we introduce a dual-stage AV-centric simulation framework that conducts imitation learning pre-training in a data-driven simulator to capture trajectory-level realism and route-level controllability, followed by reinforcement learning fine-tuning in a physics-based simulator to enhance style-level controllability and mitigate covariate shift. In the fine-tuning stage, we propose RIFT, a novel group-relative RL fine-tuning strategy that evaluates all candidate modalities through group-relative formulation and employs a surrogate objective for stable optimization, enhancing style-level controllability and mitigating covariate shift while preserving the trajectory-level realism and route-level controllability inherited from IL pre-training. Extensive experiments demonstrate that RIFT improves realism and controllability in traffic simulation while simultaneously exposing the limitations of modern AV systems in closed-loop evaluation.

Model Overview

Image description

Overview of the RIFT: Building on the IL pre-trained model, RIFT first performs route-level interaction analysis to identify critical background vehicles and their associated reference lines, enabling the generation of realistic and multimodal trajectories. To isolate style-level controllability from the trajectory-level realism and route-level controllability established during pre-training, only the scoring head is fine-tuned via RIFT, with the remaining components kept frozen. Specifically, RIFT computes group-relative advantages over all candidate rollouts, promoting alignment with user-preferred styles and mitigating covariate shift through RL fine-tuning.

Scenarios Demo

The AV-centric traffic simulation consists of the autonomoud vehicle(AV, implemented as PDM-Lite), Critical Background Vehicles (CBVs), and background vehicles (BVs), where the AV follows a predefined global route and the CBVs may interact with it at route level.

Image description


Curved-lane Following

Curved-lane Following

Intersection Navigation

Intersection Navigation

Intersection Navigation

Intersection Navigation

Intersection Navigation

Intersection Navigation

Intersection Navigation

Intersection Navigation

Straight-lane Following

Straight-lane Following

Lane Merging

Lane Merging

Intersection Navigation

Intersection Navigation

Intersection Navigation

Lane Merging

End-to-End AV Evaluation (RIFT as CBVs)

SparseDrive (AV View)

SparseDrive (Third person view)

SparseDrive (AV View)

SparseDrive (Third person view)

SparseDrive (AV View)

SparseDrive (Third person view)

UniAD (AV View)

UniAD (Third person view)

UniAD (AV View)

UniAD (Third person view)

VAD (AV View)

VAD (Third person view)

VAD (AV View)

VAD (Third person view)

Realism and Controllability Quantitative Results

Image description

RIFT consistently outperforms all baselines in both aspects across most settings. While supervised learning methods achieve slightly lower CPK and ORR, this improvement is primarily due to their inherently conservative behavior, derived from the expert PDM-Lite, which prioritizes safety by avoiding risky maneuvers.

AV Evaluation Results

Image description

RIFT produces realistic and well-structured scenarios that are effective at exposing the limitations of modern AV systems.

Speed and Acceleration Realism Results

RIFT demonstrates higher average speed and acceleration, indicating more interactive behavior, while maintaining realistic motion profiles.

Image description

BibTeX

If you find the project helpful for your research, please consider citing our paper:
@article{chen2025riftclosedlooprlfinetuning,
  title={RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation},
  author={Keyu Chen and Wenchao Sun and Hao Cheng and Sifa Zheng},
  journal={arXiv preprint arXiv:2505.03344},
  year={2025}
}