Hi, I am Xu Wan (万旭). I received my Ph.D. from the College of Control Science and Engineering at Zhejiang University in June 2026. I was previously a visiting student at the IDEAL Lab, Peking University, advised by Prof. Mingyang Sun. I am currently a researcher in the ByteDance Seed Team. During my Ph.D., I interned at Tencent Hunyuan, ByteDance Seed, Alibaba DAMO Academy, and NetEase Fuxi AI Lab, and had the pleasure of collaborating with Prof. Wotao Yin, Dr. Yansheng Wang, Dr. Yujing Hu, and many other outstanding researchers.

My research interests include large language models (LLMs), reinforcement learning (RL), and large-scale AI applications, with a special focus on LLM post-training. I’ve published several first-author papers at international AI conferences like NeurIPS, ICML, ICLR, as well as in journals such as IEEE Transactions on Power Systems with google citations

Beyond research, I am passionate about fitness and enjoys running and strength training. You can follow my training journey on my Strava profile. I am also enthusiastic about trail running and hiking.

🔥 News

  • 2026.05:  🎉🎉 Three papers about LLM Token Allocation / LLM for Optimization / T2I RL post-train got accepted at ICML 2026!
  • 2026.03:  🎉🎉 One paper about Length Penalty of LLM got accepted at ACL 2026!
  • 2026.01:  🎉🎉 One paper about Off-policy LLM-RL post-train got accepted at ICLR 2026 (first author)!
  • 2025.09:  🎉🎉 One paper about robust safe RL got accepted at NeurIPS 2025 (first author)!
  • 2025.07:  🎉🎉 I was supported by the CIE-Tencent Doctoral Research Incentive Project (with only 23 recipients nationwide and a research fund of 100,000 RMB)!
  • 2025.05:  🎉🎉 One paper about elastic cloud service got accepted at SIGKDD 2025 (co-first author)!
  • 2025.05:  🎉🎉 One paper about LLM and RL colloboratation got accepted at ICML 2025 (first author)!
  • 2024.12:  🎉🎉 One paper about multi-agent RL got accepted as an oral presentation at AAAI 2025 (first author)!

📝 Publications

🍾 Spotlight Publications

ICLR 2026
sym

Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning [Code]

Xu Wan, Yansheng Wang, Wenqi Huang, Mingyang Sun

  • BAPO is an off-policy RLVR framework to improve the data efficiency in large language models post-training.
ICML 2026
sym

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs [Code]

Xu Wan, SpeedZhu, Jiawei Cai, Guang Chen, Ximing Huang, Wiggin Zhou, Mingyang Sun

  • CLEAR implements a Lambert W policy to execute strategic abandonment, sacrificing insolvent tasks to redistribute critical computational resources to solvable complex queries.
ACL 2026
sym

AdapThink: Adaptive Thinking Preferences for Reasoning Language Model [Code]

Wenyue Xu*, Xu Wan*(co-first author), Wei Wang, Wotao Yin, Wenqi Huang, Shengjie Zhao, Mingyang Sun

  • AdapThink is an adaptive length penalty method for efficient thinking of reasoning language models.
ICML 2026
sym

ProOPF: Benchmarking and Improving LLMs for Professional-Grade Power Systems Optimization Modeling [Code]

Chao Shen, Zihan Guo, Xu Wan*(co-first author), Zhenghao Yang, Yifan Zhang, Wengi Huang, Jie Song, Zongyan Zhang, Mingyang Sun

  • ProOPF introduces a 12K-instance dataset and a 121-case expert benchmark for evaluating and improving LLMs on professional-grade optimal power flow modeling from natural language.
ICML 2025
sym

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun

  • Agents Co-Evolution (ACE) is a synergistic framework between LLMs and RL agents for large-scale decision-making scenarios.
NeurIPS 2025
sym

Fuz-RL: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty [Code]

Xu Wan, Chao Yang, Cheng Yang, Jie Song, Mingyang Sun

  • Fuz-RL is a novel fuzzy-guided robust framework for safe RL.
AAAI 2025
sym

SrSv: Integrating Sequential Rollouts with Sequential Value Estimation for Multi-agent Reinforcement Learning [Code]

Xu Wan, Chao Yang, Cheng Yang, Jie Song, Mingyang Sun

  • SrSv aims to capture agent interdependence and provide a scalable solution for cooperative MARL.

📖 Full Publications

* denotes co-first authors, # denotes corresponding author.

Under Review

2026

2025

2024

2023

2022 and Prior

🎖 Honors and Awards

  • 2022.11: First Prize in the 4th China Graduate Student Artificial Intelligence Innovation Competition (Huawei Cup), Top 6 Nationally
  • 2022.08: First Prize in the 3rd National College Student Mathematical Modeling Competition (Huashu Cup), Top 5% Nationally
  • 2020.04: First Prize in American Mathematical Contest in Modeling (MCM), Top 7.4% Globally
  • 2022.10: Second Prize in Baidu PaddlePaddle China University Computer Competition, Top 8 Nationally
  • 2022.05: Second Prize in MathorCup College Student Mathematical Modeling Challenge, Top 15% Nationally
  • 2021.11: Second Prize in the 19th China Graduate Student Mathematical Modeling Competition, Top 15% Nationally

👨‍💼 Services

  • Reviewer for ICML 2026

  • Reviewer for ICLR 2026

  • Reviewer for NeurIPS 2025

  • Reviewer for TPWRS (Transactions on Power System)

  • Program Committee for AAAI 2026 (Main Track and AIA track)