Software Engineer
Meta · Superintelligence Labs Menlo Park, CA
Building Kubernetes-based developer infrastructure for AI training, and maintaining PyTorch's test infrastructure.
- Contributing to and maintaining pytorch/test-infra, keeping CI healthy for PyTorch contributors.
- Resumable GPU environments for ML researchers, addressing GPU scarcity and idle time.
- Containerized training, dev, and CI environments on K8s across cloud and on-prem GPU fleets.