Xu Wan (δΈ‡ζ—­) is a second-year PhD student at the College of Control Science and Engineering, Zhejiang University, and currently serves as a visiting student at the IDEAL Lab of Peking University under the supervision of Prof. Mingyang Sun. During my graduate studies, I have gained valuable research experience as a research intern at NetEase Fuxi AI Lab, Ele.me, and Alibaba DAMO Academy, collaborating with Dr. Yujing Hu, Prof. Wotao Yin, and Cheng Yang. I am currently a research intern with ByteDance’s Seed-Robotic team.

My research interests encompass reinforcement learning, large language models, and large-scale AI applications, with a particular focus on power grid scheduling. I have published several first-author papers at top international AI conferences including ICML, AAAI, and IJCAI, as well as in top journals such as IEEE Transactions on Power Systems with google citations

Beyond research, I am passionate about fitness and enjoys running and strength training. You can follow my training journey on my Strava profile. I am also enthusiastic about trail running and hiking.

πŸ”₯ News

  • 2025.05: Β πŸŽ‰πŸŽ‰ One paper about elastic cloud service got accepted at SIGKDD 2025 (co-first author)!
  • 2025.05: Β πŸŽ‰πŸŽ‰ One paper about LLM and RL colloboratation got accepted at ICML 2025 (first author)!
  • 2024.12: Β πŸŽ‰πŸŽ‰ One paper about multi-agent RL got accepted as an oral presentation at AAAI 2025 (first author)!

πŸ“ Publications

Spotlight Publications

ICML 2025
sym

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun*

  • We propose Agents Co-Evolution (ACE), a synergistic framework between LLMs and RL agents for large-scale decision-making scenarios.
arxiv
sym

AdapThink: Adaptive Thinking Preferences for Reasoning Language Model

Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun*

  • We propose AdapThink, an adaptive post-training framework designed to induce more efficient thinking while maintaining the performance of reasoning language models.
AAAI 2025
sym

SrSv: Integrating Sequential Rollouts with Sequential Value Estimation for Multi-agent Reinforcement Learning

Xu Wan, Chao Yang, Cheng Yang, Jie Song, Mingyang Sun*

  • We propose a novel framework: Sequential rollout with Sequential value estimation (SrSv). This framework aims to capture agent interdependence and provide a scalable solution for cooperative MARL.

Full Publications

Under Review

2025

2024

2023

2022 and Prior

πŸŽ– Honors and Awards

  • 2022.11: First Prize in the 4th China Graduate Student Artificial Intelligence Innovation Competition (Huawei Cup), Top 6 Nationally
  • 2022.08: First Prize in the 3rd National College Student Mathematical Modeling Competition (Huashu Cup), Top 5% Nationally
  • 2020.04: First Prize in American Mathematical Contest in Modeling (MCM), Top 7.4% Globally
  • 2022.10: Second Prize in Baidu PaddlePaddle China University Computer Competition, Top 8 Nationally
  • 2022.05: Second Prize in MathorCup College Student Mathematical Modeling Challenge, Top 15% Nationally
  • 2021.11: Second Prize in the 19th China Graduate Student Mathematical Modeling Competition, Top 15% Nationally

πŸ“– Educations

  • 2024.03 - Present, Ph.D. Student in Control Science and Engineering, Zhejiang University, Hangzhou, China.
  • 2021.09 - 2024.03, M.S. in Control Science and Engineering, Zhejiang University, Hangzhou, China.
  • 2017.09 - 2021.06, B.S. in Automation, China University of Geosciences (Wuhan), Wuhan, China.
    • Intelligent Systems Research Institute, supervised by Prof. Changhe Li
    • GPA: 2/182, National Scholarship, Outstanding Graduate