I am an Associate Professor of CSE at the University of Michigan, Ann Arbor, where I lead the SymbioticLab.

CV | Bio | Students | Google Scholar | GitHub

Interests: I am interested in large-scale systems for emerging AI/ML and Big Data workloads. Our recent projects include large model training systems for the cloud, understanding and optimizing the energy consumption of AI/ML workloads, and memory disaggregation over CXL and RDMA

Impact: All SymbioticLab works are open source. We have developed the first memory disaggregation software (Infiniswap), the first software-only GPU sharing system for deep learning (Salus), the largest federated learning benchmark platform and runtime (FedScale), and the first GPU energy optimizer for DNN training (Zeus). Our research has received several paper awards from top systems venues like NSDI, OSDI, ATC, and MICRO.

In the past, I was one of the original co-creators of Apache Spark. My seminal works on coflow and virtual network embedding spawned two research directions that continue to be pursued by many.

Teaching

  • CSE 585: Advanced Scalable Systems, aka “Systems for X” [F24]
  • EECS 598: Systems for X [GenAI – W24] [AI – W21, W20] [Big Data – W19, F17]
  • EECS 489: Computer Networks [W24, F21, F20, F19, F18, W17]
  • EECS 582: Advanced Operating Systems [F16, W16]

Service