Xu Wan (δΈζ) is a second-year PhD student at the College of Control Science and Engineering, Zhejiang University, and currently serves as a visiting student at the IDEAL Lab of Peking University under the supervision of Prof. Mingyang Sun. During my graduate studies, I have gained valuable research experience as a research intern at NetEase Fuxi AI Lab, Ele.me, and Alibaba DAMO Academy, collaborating with Dr. Yujing Hu, Prof. Wotao Yin, and Cheng Yang. I am currently a research intern with ByteDanceβs Seed-Robotic team.
My research interests encompass reinforcement learning, large language models, and large-scale AI applications, with a particular focus on power grid scheduling. I have published several first-author papers at top international AI conferences including ICML, AAAI, and IJCAI, as well as in top journals such as IEEE Transactions on Power Systems with google citations
Beyond research, I am passionate about fitness and enjoys running and strength training. You can follow my training journey on my Strava profile. I am also enthusiastic about trail running and hiking.
π₯ News
- 2025.05: Β ππ One paper about elastic cloud service got accepted at SIGKDD 2025 (co-first author)!
- 2025.05: Β ππ One paper about LLM and RL colloboratation got accepted at ICML 2025 (first author)!
- 2024.12: Β ππ One paper about multi-agent RL got accepted as an oral presentation at AAAI 2025 (first author)!
π Publications
Spotlight Publications

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making
Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun*
- We propose Agents Co-Evolution (ACE), a synergistic framework between LLMs and RL agents for large-scale decision-making scenarios.

AdapThink: Adaptive Thinking Preferences for Reasoning Language Model
Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun*
- We propose AdapThink, an adaptive post-training framework designed to induce more efficient thinking while maintaining the performance of reasoning language models.

Xu Wan, Chao Yang, Cheng Yang, Jie Song, Mingyang Sun*
- We propose a novel framework: Sequential rollout with Sequential value estimation (SrSv). This framework aims to capture agent interdependence and provide a scalable solution for cooperative MARL.
Full Publications
Under Review
- AdapThink: Adaptive Thinking Preferences for Reasoning Language Model, Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun, Under Review
- Fuz-RL: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty, Xu Wan, Chao Yang, Cheng Yang, Jie Song, Mingyang Sun, Under Review
- SAMG: Offline-to-Online Reinforcement Learning via State-Action-Conditional Offline Model Guidance, Liyu Zhang, Haochi Wu, Xu Wan, Quan Kong, Ruilong Deng, Mingyang Sun, Under Review
2025
- IVMR suite: An Industrial-scale Virtual Machine Rescheduling Dataset and Benchmark for Elastic Cloud Service, Yupeng Zhang*, Xu Wan*, Xiangyun Kong*, Chao Yang, Binda Ma, Wotao Yin, Jian Zhou, SIGKDD 2025
- Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making, Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun, ICML 2025
2024
- SrSv: Integrating Sequential Rollouts with Sequential Value Estimation for Multi-agent Reinforcement Learning, Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun, AAAI 2025 (Oral)
- AdapSafe2: Prior-Free Safe-Certified Reinforcement Learning for Multi-Area Frequency Control, Xu Wan, Mingyang Sun, IEEE Trans. Power System
2023
- Highly Transferable Adversarial Attack Against Deep-Reinforcement-Learning-Based Frequency Control, Zhongwei Li, Yang Liu, Peng Qiu, Hongyan Yin, Xu Wan (Corresponding Author), Mingyang Sun, Energy Convers. Econ
- AdapSafe: Adaptive and Safe-Certified Deep Reinforcement Learning-Based Frequency Control for Carbon-neutral Power Systems, Xu Wan, Mingyang Sun, Boli Chen, Zhongda Chu, Fei Teng, AAAI 2023
2022 and Prior
- Physics-Constrained Vulnerability Assessment of Deep Reinforcement Learning-Based SCOPF, Lanting Zen, Mingyang Sun, Xu Wan, Zhenyong Zhang, Ruilong Deng, Yan Xu, IEEE Trans. Power System
- Exploring the Vulnerability of Deep Reinforcement Learning-based Emergency Control for Low Carbon Power Systems, Xu Wan, Lanting Zen, Mingyang Sun, IJCAI 2022
π Honors and Awards
- 2022.11: First Prize in the 4th China Graduate Student Artificial Intelligence Innovation Competition (Huawei Cup), Top 6 Nationally
- 2022.08: First Prize in the 3rd National College Student Mathematical Modeling Competition (Huashu Cup), Top 5% Nationally
- 2020.04: First Prize in American Mathematical Contest in Modeling (MCM), Top 7.4% Globally
- 2022.10: Second Prize in Baidu PaddlePaddle China University Computer Competition, Top 8 Nationally
- 2022.05: Second Prize in MathorCup College Student Mathematical Modeling Challenge, Top 15% Nationally
- 2021.11: Second Prize in the 19th China Graduate Student Mathematical Modeling Competition, Top 15% Nationally
π Educations
- 2024.03 - Present, Ph.D. Student in Control Science and Engineering, Zhejiang University, Hangzhou, China.
- IDEAL Lab, supervised by Prof. Mingyang Sun
- Visiting student at Peking University (2024-Present)
- 2021.09 - 2024.03, M.S. in Control Science and Engineering, Zhejiang University, Hangzhou, China.
- NeSC Lab, supervised by Prof. Jiming Chen
- GPA: 1/60, National Scholarship, Outstanding Graduate Student
- 2017.09 - 2021.06, B.S. in Automation, China University of Geosciences (Wuhan), Wuhan, China.
- Intelligent Systems Research Institute, supervised by Prof. Changhe Li
- GPA: 2/182, National Scholarship, Outstanding Graduate