Image description

Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation

Keyu Chen1    Wenchao Sun1    Hao Cheng1    Sifa Zheng1   
1School of Vehicle and Mobility, Tsinghua University
arXiv 2025

we introduce a dual-stage AV-centered simulation framework that conducts open-loop imitation learning pre-training in a data-driven simulator to capture trajectory-level realism and multimodality, followed by closed-loop reinforcement learning fine-tuning in a physics-based simulator to enhance controllability and mitigate covariate shift.

Image description

Abstract

Achieving both realism and controllability in interactive closed-loop traffic simulation remains a key challenge in autonomous driving. Data-driven simulation methods reproduce realistic trajectories but suffer from covariate shift in closed-loop deployment, compounded by simplified dynamics models that further reduce reliability. Conversely, physics-based simulation methods enhance reliable and controllable closed-loop interactions but often lack expert demonstrations, compromising realism. To address these challenges, we introduce a dual-stage AV-centered simulation framework that conducts open-loop imitation learning pre-training in a data-driven simulator to capture trajectory-level realism and multimodality, followed by closed-loop reinforcement learning fine-tuning in a physics-based simulator to enhance controllability and mitigate covariate shift. In the fine-tuning stage, we propose RIFT, a simple yet effective closed-loop RL fine-tuning strategy that preserves the trajectory-level multimodality through a GRPO-style group-relative advantage formulation, while enhancing controllability and training stability by replacing KL regularization with the dual-clip mechanism. Extensive experiments demonstrate that RIFT significantly improves the realism and controllability of generated traffic scenarios, providing a robust platform for evaluating autonomous vehicle performance in diverse and interactive scenarios.

Model Overview

Image description

Overview of the RIFT: The upper section illustrates the overall architecture of RIFT. To enhance controllability, only the trajectory scoring head is fine-tuned, with the rest of the pre-trained network kept frozen to preserve trajectory-level realism. (a) The CBV identification mechanism introduces route-level interactions between the AV and CBVs. (b) Closed-loop fine-tuning improves user-aligned controllability and mitigates covariate shift.

Scenarios Demo

The AV-centered traffic simulation environment consists of the autonomoud vehicle(AV, implemented as PDM-Lite), Critical Background Vehicles (CBVs), and background vehicles (BVs), where the AV follows a predefined global route and the CBVs may interact with it at route level.

Image description


Curved-lane Following

Curved-lane Following

Intersection Navigation

Intersection Navigation

Intersection Navigation

Intersection Navigation

Intersection Navigation

Intersection Navigation

Intersection Navigation

Intersection Navigation

Straight-lane Following

Straight-lane Following

Lane Merging

Lane Merging

Intersection Navigation

Intersection Navigation

Intersection Navigation

Lane Merging

End-to-End AV Evaluation (RIFT as CBVs)

UniAD (AV View)

UniAD (Third person view)

UniAD (AV View)

UniAD (Third person view)

VAD (AV View)

VAD (Third person view)

VAD (AV View)

VAD (Third person view)

Realism and Controllability Quantitative Results

Image description

RIFT consistently outperforms all baselines in both aspects across most settings. While supervised learning methods achieve slightly lower infraction rates, this improvement is primarily due to their inherently conservative behavior, derived from the expert PDM-Lite, which prioritizes safety by avoiding risky maneuvers.

AV Evaluation Results

Image description

under the RIFT-generated scenarios, AVs achieve near-optimal Driving Score (DS), Route Completion (RC), and Infraction Score (IS), along with the lowest BR. These findings highlight RIFT’s effectiveness in minimizing blocking while maintaining realism and interactivity, underscoring its superiority in closed-loop AV evaluation.

Speed and Acceleration Realism Results

RIFT demonstrates higher average speed and acceleration, indicating more interactive behavior, while maintaining realistic motion profiles.

Image description

BibTeX

If you find the project helpful for your research, please consider citing our paper:
@article{chen2025riftclosedlooprlfinetuning,
  title={RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation},
  author={Keyu Chen and Wenchao Sun and Hao Cheng and Sifa Zheng},
  journal={arXiv preprint arXiv:2505.03344},
  year={2025}
}