Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

portfolio

Flappy-Bird-AI

An RL agent trained with Proximal Policy Optimization (PPO) to play Flappy Bird — a sandbox for reward shaping, observation design, and stable policy-gradient training under sparse, repetitive feedback.

gradescope-mcp

An open-source MCP server with 34 tools that lets AI assistants drive Gradescope workflows — course, assignment, rubric, submission, roster, extension, and regrade management — under a human-approved ‘preview-first’ write protocol.

StudyGround

A Claude Code plugin for AI-assisted self-study: rendered lessons in a local web reader, in-browser Python via Pyodide, a course-aware tutor, and one-click hand-off into VSCode for exercises.

publications

TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

Under review

Multi-task PPO suffers from critic-side gradient ill-conditioning where tail tasks stall while easy tasks dominate value-function updates. We introduce Critic Balancing — per-task PopArt value normalization, pre-activation LayerNorm in the critic body, and per-side gradient combiners (PCGrad / CAGrad / FairGrad chosen independently for actor and critic). On Meta-World+ MT50 it surpasses published SAC- and ARS-family baselines on both mean and worst-k tail-task success while using up to 22.7× fewer parameters and substantially fewer environment steps.

Yuanpeng Li, Gefei Lin, Annie Qu, Rui Miao. Under review. arXiv:2605.11473

talks

teaching

STATS 205P — Bayesian Data Analysis

Graduate course, UC Irvine, Master of Data Science Program, 2024

Graduate teaching assistant. Prior specification, Bayesian linear and generalized linear regression for the Master of Data Science program.

STATS 7 — Basic Statistics

Undergraduate course, UC Irvine, Department of Statistics, 2024

Graduate teaching assistant. Introductory inferential statistics: confidence intervals, hypothesis testing, and regression.