Hi, I am Xu Wan (δΈ‡ζ—­), a third-year PhD student at the College of Control Science and Engineering, Zhejiang University, and currently serves as a visiting student at the IDEAL Lab of Peking University under the supervision of Prof. Mingyang Sun. During my graduate studies, I have gained valuable research experience as a research intern at NetEase Fuxi AI Lab, Alibaba DAMO Academy, and ByteDance Seed-Robotics Team, collaborating with Dr. Yujing Hu, Prof. Wotao Yin and Dr. Yansheng Wang. I am currently a research intern with Tecent Hunyuan team.

My research interests include large language models (LLMs), reinforcement learning (RL), and large-scale AI applications, with a special focus on LLM post-training. I’ve published several first-author papers at international AI conferences like NeurIPS, ICML, KDD, and AAAI, as well as in journals such as IEEE Transactions on Power Systems with google citations

Beyond research, I am passionate about fitness and enjoys running and strength training. You can follow my training journey on my Strava profile. I am also enthusiastic about trail running and hiking.

πŸ”₯ News

  • 2025.09: Β πŸŽ‰πŸŽ‰ One paper about robust safe RL got accepted at NeurIPS 2025 (first author)!
  • 2025.07: Β πŸŽ‰πŸŽ‰ I was supported by the CIE-Tencent Doctoral Research Incentive Project (with only 23 recipients nationwide and a research fund of 100,000 RMB)!
  • 2025.05: Β πŸŽ‰πŸŽ‰ One paper about elastic cloud service got accepted at SIGKDD 2025 (co-first author)!
  • 2025.05: Β πŸŽ‰πŸŽ‰ One paper about LLM and RL colloboratation got accepted at ICML 2025 (first author)!
  • 2024.12: Β πŸŽ‰πŸŽ‰ One paper about multi-agent RL got accepted as an oral presentation at AAAI 2025 (first author)!

πŸ“ Publications

🍾 Spotlight Publications

NeurIPS 2025
sym

Fuz-RL: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty

Xu Wan, Chao Yang, Cheng Yang, Jie Song, Mingyang Sun

  • We propose Fuz-RL, a novel fuzzy-guided robust framework for safe RL.
ICML 2025
sym

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun

  • We propose Agents Co-Evolution (ACE), a synergistic framework between LLMs and RL agents for large-scale decision-making scenarios.
arxiv
sym

AdapThink: Adaptive Thinking Preferences for Reasoning Language Model

Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun

  • We propose AdapThink, an adaptive post-training framework designed to induce more efficient thinking while maintaining the performance of reasoning language models.
AAAI 2025
sym

SrSv: Integrating Sequential Rollouts with Sequential Value Estimation for Multi-agent Reinforcement Learning

Xu Wan, Chao Yang, Cheng Yang, Jie Song, Mingyang Sun

  • We propose a novel framework: Sequential rollout with Sequential value estimation (SrSv). This framework aims to capture agent interdependence and provide a scalable solution for cooperative MARL.

πŸ“– Full Publications

* denotes co-first authors, # denotes corresponding author.

Under Review

2025

2024

2023

2022 and Prior

πŸŽ– Honors and Awards

  • 2022.11: First Prize in the 4th China Graduate Student Artificial Intelligence Innovation Competition (Huawei Cup), Top 6 Nationally
  • 2022.08: First Prize in the 3rd National College Student Mathematical Modeling Competition (Huashu Cup), Top 5% Nationally
  • 2020.04: First Prize in American Mathematical Contest in Modeling (MCM), Top 7.4% Globally
  • 2022.10: Second Prize in Baidu PaddlePaddle China University Computer Competition, Top 8 Nationally
  • 2022.05: Second Prize in MathorCup College Student Mathematical Modeling Challenge, Top 15% Nationally
  • 2021.11: Second Prize in the 19th China Graduate Student Mathematical Modeling Competition, Top 15% Nationally

πŸ‘¨β€πŸ’Ό Services

  • Reviewer for ICLR 2026

  • Reviewer for NeurIPS 2025

  • Reviewer for TPWRS (Transactions on Power System)

  • Program Committee for AAAI 2026 (Main Track and AIA track)