CV
Education
- Ph.D in Computer Science, Hong Kong University of Science and Technology, 2028 (expected)
- B.S. in Computer Science, ShanghaiTech University, 2021
Work experience
- November 2023 - Present: Research Intern at Alibaba Group, Hangzhou, China
- Designed and implemented a straggler detection and mitigation framework for hybrid-parallel large model training.
- Analyzed production traces from the Alibaba HPAI cluster.
- Mentor: Yinghao Yu
- May 2022 - July 2023: Research Intern at Microsoft Research Asia, Beijing, China
- Developed Catur, a NUMA-aware scheduler for VM allocation on Microsoft Azure utilizing reinforcement learning.
- Implemented system optimizations for reinforcement learning frameworks.
- Mentors: Qi Chen, Hui Xue
- August 2021 - April 2022: Teaching Assistant at NYU Shanghai, Shanghai, China
Skills
- Programming Languages (in order of familiarity)
- Python
- C/C++
- CUDA
- Rust
- Shell
- JavaScript
- HPC/OS-related Software (in order of familiarity)
- Linux (Ubuntu, Debian, CentOS, Arch Linux, Manjaro)
- Docker
- Kubernetes
- Slurm
- MPI, OpenMP
- DL-related Frameworks (in order of familiarity)
- PyTorch
- Megatron-LM
- vLLM
- DeepSpeed
- Ray RLlib
- Misc
Publications
Teaching