I am currently a part of LDOS at UT-Austin. I graduated from UW-Madison, where I was Advised by Dimitris Papailiopoulos and Shivaram Venkataraman . My research interests are primarily in Systems for Machine Learning, especially around distributed training and inference of ML workloads. During my PhD I have been very fortunate to intern with Bilge Acun at FAIR, Amar Phanishayee at Microsoft Research and Yucheng Low at Apple.
During my time in Madison, when I was not being a grad student, I very likely was racing keelboats on Lake Mendota or alpine skiing in the winters. I also doubled up as a sailing instructor at the UW-Madison’s Hoofers Sailing club. Since moving to Austin, I have been racing keelboats on Lake Travis and teaching sailing with Austin Yacht club, while my ski’s languish.
Teaching
CS 395T, Principles of Learned Systems
Awards
- Decoding Speculative Decoding won NAACL’25 SAC Award (9 papers out of 638 accepted)
- Selected as 2024 Machine Learning and Systems Rising Star
Publications
-
Nalar: An agent serving framework
Marco Laju, Donghyun Son, Saurabh Agarwal, Nitin Kedia, Myungjin Lee, Jayanth Srinivasa, Aditya Akella. [preprint] -
SYMPHONY: Improving Memory Management for LLM Inference Workloads
Saurabh Agarwal, Bodun Hu, Anyong Mao, Aditya Akella, Shivaram Venkataraman. NSDI’26 -
UNUM: A New Framework for Network Control.
Jiayi Chen, Nihal Sharma, Debajit Chakraborty, Saurabh Agarwal, Jeffrey Zhou, Aditya Akella, Sanjay Shakkottai. NSDI’26 -
Patchwork: A Unified Framework for RAG Serving.
Bodun Hu, Luis Pabon, Saurabh Agarwal, Myungjin Lee, Jayanth Srinivasa, Aditya Akella. [preprint] -
Decoding Speculative Decoding.
M Yan, S Agarwal, S Venkataraman. [paper] EMNLP’25 SAC Award, Oral -
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
Zheyang Xiong, Ziyang Cai, John Cooper, Albert Ge, Vasilis Papageorgiou, Zack Sifakis, Angeliki Giannou, Ziqian Lin, Liu Yang, Saurabh Agarwal, Grigorios Chrysos, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos ICML’25 Spotlight -
StitchLLM: Serving LLMs, One Block at a Time.
Bodun Hu, Shuozhe Li, Saurabh Agarwal, Myungjin Lee, Akshay Jajoo, Jiamin Li, Le Xu, Geon-Woo Kim, Donghyun Kim, Hong Xu, Amy Zhang, Aditya Akella. ACL 2025
[paper] -
CHAI: Clustered Head Attention for Efficient LLM Inference.
S Agarwal, B Acun, B Hosmer, M Elhoushi, Y Lee, S Venkataraman, D Papailiopoulos, C Wu. ICML’ 24
[paper] -
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
M Elhoushi, A Shrivastava, D Liskovich, B Hosmer, B Wasti, L Lai, A Mahmoud, B Acun, S Agarwal, A Roman, A Aly, B Chen, C Wu. ACL’ 24
[paper] -
Blox: A Modular Toolkit for Deep Learning Schedulers.
S Agarwal, A Phanishayee, S Venkataraman. Eurosys’24.
[paper] [source] -
Bagpipe: Accelerating deep recommendation model training.
S Agarwal, C Yan, Z Zhang, S Venkataraman. SOSP’23.
[paper][source] -
Cuttlefish: Low-rank Model Training without All The Tuning.
H Wang, S Agarwal, Y Tanaka, E Xing, D Papailiopoulos. MLSys’23.
[paper] -
Pufferfish: Communication-efficient models at no extra cost.
H Wang, S Agarwal, D Papailiopoulos. MLSys’22.
[paper] -
On the utility of Gradient compression
S Agarwal, H Wang, S Venkataraman, D Papailiopoulos. MLSys’22.
[paper][source] -
Adaptive Gradient Communication via Critical Learning Regime Identification.
S Agarwal, H Wang, K Lee, S Venkataraman, D Papailiopoulos. MLSys’21.
[paper][source] -
AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning.
Y Liu, S Agarwal, S Venkataraman.
[paper] -
Attack of the tails: Yes, you really can backdoor federated learning.
H Wang, K Sreenivasan, S Rajput, H Vishwakarma, S Agarwal, J Sohn, K Lee, D Papailiopoulos. Neurips’21
[paper]
Service
Reviewer: ICML ‘23, ICLR ‘23, Neurips ‘22, Neurips ‘21
ERC: MLSys ‘22, Usenix ATC ‘23