Generative Video for Humanoid Control (GVHC)
Video diffusion models as motion priors for language-conditioned humanoid locomotion.
I recently graduated from UC Berkeley with a Bachelor's + Master's in EECS. My thesis was supervised by Koushil Sreenath and investigated the use of video diffusion models for humanoid robot control. I was Head TA for CS188 (Intro to AI) under Pieter Abbeel and Igor Mordatch, and previously did research at RBC and Toronto General Hospital / UHN. I'm mainly interested in 3D perception, spatial reasoning, and world models.
Video diffusion models as motion priors for language-conditioned humanoid locomotion.
Controlled study of how RGB, depth, structured text, and combined inputs affect VLM spatial reasoning.
VLM-guided replay augmentation and counterfactual relabeling for sparse-reward robotic manipulation.
Autonomous indoor position hold on a 250 g quadrotor using only a Raspberry Pi, a single camera, and a printed chessboard — no LiDAR, motion capture, or GPU.