Undergraduate Student · Tsinghua University

Building language systems that reason, plan, and act with people.

I'm Zixuan (Alex) Wang, a Physics undergraduate at Tsinghua University working on AI Agents and Human-centered AI with special interests in alignment and agentic reinforcement learning, currently a research assistant at CMU supervised by Andrea Zanette.

Download CV ↗ GitHub Email

I am Zixuan (Alex) Wang, a Physics undergraduate at Tsinghua University (class of 2023). My research aims toward human-centered AI and autonomous agentic systems — LLMs that reason, plan, and act with people.

I am a remote research assistant at Carnegie Mellon University (ECE), advised by Prof. Andrea Zanette. During my Fall 2025 exchange at UC San Diego, I was an undergraduate researcher at the UCSD MixLab (HDSI) with Dr. Zhen Wang. Earlier, I was an AI intern at MiroMind AI, mentored by Dr. Yuntao Chen.

Research Interests

I work on LLM algorithms that enable more reliable, adaptive, and autonomous interaction with humans — leveraging synthetic data, structured supervision, and preference-based training to improve models' understanding of human intent, while ensuring privacy and safety.

Agentic LLM Systems

Long-context reasoning, planning, tool use, and self-refinement in real-world grounded environments.

LLM Alignment

Synthetic data, structured supervision, and preference-based training for human intent — with privacy and safety built in.

RL for LLMs

Training agents for sustained multi-turn interaction and long-horizon task completion.

2023 — present

B.S. in Mathematical and Physical Basic Science (Physics)

Tsinghua University

Minor in Artificial Intelligence

Fall 2025

Exchange Student in Computer Science

UC San Diego

Intern · Internship

Looki

Apr 2026 — Present · 1 mo ·Beijing, China ·On-site

Agentic AIHuman-Centered AILarge Language Models

Research Assistant

Carnegie Mellon University — Department of Electrical and Computer Engineering

Feb 2026 — Present · 3 mos ·Remote ·Remote

Agentic AILarge Language Models

AI Intern · Internship

MiroMind.ai

Jun 2025 — Jan 2026 · 8 mos ·Beijing, China ·Hybrid

Agentic AILarge Language Models

Research Assistant

Swartz Center for Computational Neuroscience — UC San Diego

Sep 2025 — Dec 2025 · 4 mos ·San Diego, California, United States ·On-site

Human-Centered AIHuman-Computer InterfaceEye Tracking

Research Assistant

Halıcıoğlu Data Science Institute — UC San Diego (HDSI) — MixLab, advised by Dr. Zhen Wang

Sep 2025 — Dec 2025 · 4 mos ·San Diego, California, United States ·On-site

Interactive AILarge Language Models

Papers

Submission

Harness RL is Meta-Learning: Training to Self-Improve at Test Time

Alvin Zhang, Zixuan Wang, Xuecheng Liu, Fahim Tajwar, Ruslan Salakhutdinov, Daniel Khashabi, Yuda Song, Andrea Zanette

In Submission

Submission

Mind2Dialogue: Shared-Mind Simulation and Privileged Distillation for Personalization and Theory of Mind

Zixuan Wang*, Yufan Zhou*, Jinzhou Tang*, Chengjun Wu, Adyasha Patra, Lyumanshan Ye, Zhaoxiang Feng, Letian Peng, Enze Ma, Xinle Yu, Fan Bai, Zhengding Hu, Jianyang Gu, Zhao Wang, Yufei Ding, Jingbo Shang, Tianmin Shu, Zhiting Hu, Zhen Wang

In Submission

(*Equal Contribution)

Submission

Lightweight Neural Refinement for Drift Calibration in Eye Tracking Systems

Liu Jiaqi*, Zixuan Wang*, Yuhong Zhang, Dingkang Liang, Jane Hanqi Li, Tzyy-Ping Jung, Gert Cauwenberghs†

In Submission

(*Equal Contribution)

Submission

MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery

Enze Ma, Yufan Zhou, Wei-Chieh Huang, Jie Yang, Huanhuan Ma, Zixuan Wang, Chengze Li, Chunyu Miao, Philip S. Yu, Zhen Wang

In Submission

Technical Reports

Technical Report

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents

Preprint, arXiv:2511.11793

Code arXiv Project Contributor

Selected Writing

Efficient Large-scale RL for Deep Research Agent: Rollout Chunking with Context Compression

Technical Blog, February 2026

Technical Blog Core Contributor

Technical Blogs

7/2026

Recovering Summarized CoT from Claude

Chain-of-Thought

2/2026

Multi-hop QA Synthesis for Large-Scale Deep Research Agent

Multi-hop QAData SynthesisDeep Research

1/2026

Stabilizing Large-scale MoE Agentic Reinforcement Learning Training

MoEReinforcement LearningAgentic Training

Course Notes

5/2025

3D Visual Computing Course Notes

Notes on 3D visual computing, including geometry processing, rendering, and 3D reconstruction techniques.

Computer Graphics3D VisionCourse Notes

1/2025

Machine Learning Course Notes - Learning Theory

Comprehensive notes on learning theory, covering PAC learning, VC dimension, and statistical learning foundations.

Machine LearningLearning TheoryCourse Notes

Course

An Optimization View of DP LLM Fine-tuning: When Does Bias Correction Help, and Can the Optimizer Be Improved?

Optimization Course Project

PDF Code Independent

Course

Ego-embodied Reasoner: Egocentric Embodied Reasoning and Planning with MLLM via Reinforcement Learning

Deep Reinforcement Learning Course Project, Instructed by Professor Huazhe Xu

PDF Code Video Project Leader

Course

EgoHOI: Prior-Guided 3D Hand-Object Interaction Reconstruction from Monocular Egocentric RGB Video

Computer Vision Course Project

PDF Code Project Leader

Course

Unconditional and Image-conditioned 3D Generation

3D Visual Computing Course Project, instructed by Professor Li Yi

PDF Code Video Independent

Course

Human Skeleton and Skin Generation

Fundamentals of Computer Graphics Course Project, the 5-th Jittor AI competition, track 2

PDF Code Project Leader

Course

Project Reading Report

Object Oriented Programming Course Project

PDF Code Independent

I build with agents, and agents run on tokens. Every chart below is live telemetry from my own machines — every token my coding agents have read, cached, and written.

tokens, all time

cache hit rate

api-equivalent value

tokens written back