Qinwei Ma

Hi! I'm Qinwei Ma (马钦伟), you can call me Martin, or my nicknames Martini/Aquapony. I'm now a senior undergraduate student majoring Computer Science and Technology at IIIS, Tsinghua University (a.k.a Yao Class, directed by the Turing Award Laureate Andrew Chi-Chih Yao). I'm honored to start my PhD study in Fall 2026 advised by Prof. Alex Lamb at College of AI, Tsinghua University.

Previously, I had the honor to be mentored by many great professors at various directions, including Prof. Hang Zhao, Prof. Chuang Gan, Prof. Tong Zhang, Dr. Lei Li etc.

In addition, I have an important long-term collaborator, my high school and undergraduate classmate Jingzhe Shi.

Before college, I took part in the 36th National High School Physics Olympiad, winning a gold medal which directly granted me access to Yao Class. I also took part in the Mathematics and Informatics Olympiad. I studied in Shanghai Foreign Language School and studied German as my first foreign language there.

Most importantly, I wish to thank my girlfriend Wanfei Li who keeps supporting me during my research and daily life.

Feel free to contact me for potential collaboration!

📬 Email | 📑 CV | 🎓 Google Scholar | 💻 Github

News

2025-09: ✨ Our paper RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text was accepted by ICCV 2025!

2024-09: ✨ Our paper Scaling Law for Time Series Forecasting was accepted by NeurIPS 2024!

2024-07: Our paper CHOPS was accepted by COLM 2024!

Research Interest

My research interest lies in a broad range of fields, but all with a common goal to build more intellectual, helpful yet safe AI systems.

More specifically, these are the major topics I am currently interested in: Besides these directions, I am also interested in many directions but did not have a chance to give them a try. Generally speaking, I wish to find a proper way to regulate the agents in the future, align them with human interests and make them more controllable, secure and helpful even with superhuman power.

Self-evolving agents, especially how we can leverage RL post-training to make the agents effectively explore, self-improve and adapt to new environments.

World Models, especially how we can learn a compact, efficient yet effective world model to help agents plan and act better in complex environments.

Theoretical analysis of language models and learning methods. I wish to find the most fundamental laws that drive the success of current models and learning algorithms. This may help us better explain, improve and control future models better.

Personal Interest

Besides research, I make much effort to make my life enriched. This includes generally:

Musical Theater

I am a great fans for musical theaters, and once had the opportunity to take part in multiple plays as actors. An incomplete list of which includes:

Aaron Burr in 'Hamilton' (English)

Schikaneder in 'Mozart!'

Favell in 'Rebecca'

Marius in 'Les Miserables'

Beside these, I have also acted as many important roles in short cuts of musicals, including Gabe in 'Next to Normal', Raoul in 'Phantom of the Opera', etc. I also directed a ten-minute mixed cut of 'Next to Normal' in the tenth anniversary of Tsinghua Musical Club.

Sports

Though having suffered from a major injury in my sophomore year in college, I keep active in various sports, including soccer, badminton, etc.

Bridge Card

Bridge is an important hobby I have kept since I was in junior high school. In college I am currently a member of the Tsinghua Bridge Team. I took part in the National College Student Bridge Tournament twice in our team, ranking the eighth and sixth among the finalists respectively.

Werewolf Game

I am also a great fans of werewolf-like games. I was once invited to the most popular variety show for werewolves in China (京城大师赛), but failed to attent due to mismatched schedules.

Selected Publications

(* for equal contribution, † for second author author)

	PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning Wanjia Zhao, Qinwei Ma>, Jingzhe Shi, Shirley Wu, Jiaqi Han, Yijia Xiao, Si-Yuan Chen, Xiao Luo, Ludwig Schmidt, James Zou Preprint, 2025* We introduce a process-level physics reasoning benchmark based on Directed Acyclic Graphs (DAGs) that explicitly encode causal dependencies among solution steps. We provide theoretical proofs of the optimality of both the DAG representation and the corresponding scoring policy, ensuring principled and interpretable evaluation. We develop a fully rule-based symbolic formula matching system for robust, heuristic-free validation across diverse solution forms. Our framework achieves stronger alignment with human expert grading and reveals persistent reasoning gaps in state-of-the-art LLMs, offering diagnostic insights and signals for future training.
	ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning Hanyang Chen, Mark Zhao, Rui Yang, Qinwei Ma†, Ke Yang, Kangrui Wang, Hao Bai, Zhenhailong Wang, Jiarui Yao, Rui Pan, Mengchao Zhang, Jose Barreiros, Aykut Onol, ChengXiang Zhai, Heng Ji, Manling Li, Huan Zhang, Tong Zhang Preprint, 2025 We introduce ERA, a unified two-stage framework that integrates knowledge distillation and online reinforcement learning for scalable embodied vision-language agents. We propose three novel types of priors — Trajectory-Augmented, Environment-Anchored, and External Knowledge Priors — enabling smaller agents to inherit rich world knowledge and reasoning ability. We design an online RL pipeline with self-summarization, dense reward shaping, and turn-level optimization to tackle long-horizon, sparse-reward challenges in embodied learning. We achieve significant performance and generalization improvements over large prompting-based models (e.g., GPT-4o) on both high-level planning and low-level manipulation benchmarks.
	Scaling Law for Time Series Forecasting Jingzhe Shi, Qinwei Ma, Huan Ma, Lei Li NeurIPS 2024 (poster, main track) Code / arXiv / OpenReview We proposed a theoretical framework for Scaling Law for Time Series Forecasting, taking into account look back horizon as well as dataset size and model size. We conducted experiments to validate our theory proposed and assumptions made. Our key theoretical and experimental findings were that optimal look back horizon does exist and it increases with dataset size, calling for a more fair comparison when proposing new time series forecasting models.
	CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs Jingzhe Shi, Jialuo Li, Qinwei Ma, Zaiwen Yang, Huan Ma, Lei Li COLM 2024 (poster) Code / arXiv / OpenReview We proposed CHOPS, an LLM agent designed to efficiently access user information, interact with existing systems, and provided accurate, safe responses by leveraging a combination of small and large LLMs. Validated using the CPHOS-dataset we proposed in the same work, CHOPS demonstrated its potential to enhance or replace human customer service.
	Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm Kaisen Yang, Lixuan He, Rushi Shah, Kaicheng Yang, Qinwei Ma, Dianbo Liu, Alex Lamb Preprint, 2025 Code We propose the Explore-Execute Chain (E²C), a reasoning framework that separates exploration from execution to achieve state-of-the-art efficiency, generalization, and interpretability in LLM reasoning with drastically fewer decoding tokens.
	RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text Jiaben Chen, Xin Yan, Yihang Chen Siyuan Cen Qinwei Ma, Haoyu Zhen> Kaizhi Qian Lie Lu Chuang Gan ICCV 2025 Project Page We introduce a framework for generating 3D body motions and singing vocals from textual lyrics using a multimodal transformer. Our approach, based on the RapVerse dataset, unifies language, audio, and motion through quantized models, achieving realistic joint generation of vocals and motions.

Education Experience

	University of Illinois Urbana-Champaign 2025.02 - 2024.09 Visiting researcher Research Advisor: Prof. Tong Zhang.
	MIT 2023.11 - 2024.06 Visiting researcher Research Advisor: Prof. Chuang Gan.
	Tsinghua University 2021.09 - Present Undergraduate Student

Representative Honors and Awards

(complete list can be found in my CV)

2022-2024: Three different school-level scholarships in Scientific and Technological Achievements, Artistic and Cultural Performance and Social Work respectively.

2024: First prize in Tsinghua Challenge Cup.

2019: Gold Medalist 🏅 in the 36st National High School Physics Olympiad (CPhO), ranking 26th in the final round.

Language and Skills

Language: Chinese (Native), English (Very Proficient), German (B2).

Programming languages: Python, C/C++, etc.

Mathematics: Calculus (Very Proficient), Linear Algebra (Proficient), Probabilistic Theory (Proficient), Statistics, Abstract Algebra, Game Theory, etc.

Service

ICLR 2025 Reviewer.

Part-time teacher in AI and Physics.

Fun facts

My favorite musical theater is 'Hamilton'. My favorite movie is 'Pulp Fiction'.

I've been wish to improve the law system (Chinese and global) in the two fields: 1) Technological Safety and Privacy, 2) Gender Equality and Female Rights.

This homepage is designed based on Jon Barron's homepage and deployed on GitHub Pages. Last updated: Jan, 2025.
© 2025 Qinwei Ma