📢 I am on the academic job market this year (2025-2026 cycle).

📢 I am on the academic job market this year (2025-2026 cycle).

About Me: I am interested in data processing, one of computer science’s most universal and impactful applications, found in nearly every industry. I build open-source LLM-powered data systems. My approach to research is to study where the obvious ideas break in long-range, real-world deployments. The resulting insights drive advances in system internals, and deepen our understanding of user interaction in data and AI systems.

Impact: In my PhD, I built the DocETL stack (3K GitHub stars) for unstructured text analysis at scale. DocETL has been deployed in applications across journalism, law, medicine, policy, finance, and urban planning. It was recently used by public defenders in California to analyze court documents for two criminal trials, marking one of the first real-world uses of LLM systems in courtroom proceedings. DocETL's techniques such as agentic query optimization (VLDB 2025, SIGMOD 2026) are being integrated into industrial databases like Snowflake and Google BigQuery, while ideas for AI evaluation and interface design, published at UIST 2024 and UIST 2025 🏆, have been adopted by LangChain, ChromaDB, and OpenAI.

Evals for AI Engineers book cover Education and Outreach: I co-founded the AI Evals for Engineers and PMs course and authored its companion book. The course has been taken by 3,000+ professionals from 500+ companies—including 50+ each from Google, Microsoft, OpenAI, Meta, Amazon, Intuit, and First American—and the course-reader mailing list now exceeds 25,000 people. I co-teach the course with my long-time friend and collaborator Hamel Husain, an industry leader and former ML engineer at Airbnb and GitHub, whose perspective is central to its impact. Interestingly, developing this course required inventing new research techniques—for example, methods for error analysis of LLM agents and for trustworthy, scalable LLM-as-judge evaluation. O'Reilly will publish the book in Spring 2026.

Other Facts: My research is supported by an NDSEG Fellowship, Bridgewater AI Labs, copious compute credits from Modal Labs, and sponsors of the EPIC Lab. Before my PhD, I worked as a machine learning and data engineer at a couple of startups. Before that, I did my undergrad in computer science at Stanford University.

Publications

People

Advisor: Aditya Parameswaran

Other Frequent Collaborators: Bhavya Chopra, Björn Hartmann, Joe Hellerstein, Madelon Hulsebos, Yiming Lin, Eugene Wu, J.D. Zamfirescu-Pereira, Sepanta Zeighami

Mentees:
Current: Ruiqi Chen (UW MS), Lindsey Wei (UW UG), Nikhil and Vinay Rao (High School)
Past: Parth Asawa (now PhD @ Berkeley), Quentin Romero Lauro (Pitt UG), Rachel Lin (Berkeley MS), Aditi Mahajan (now Google), Reya Vir (now PhD @ Columbia), Yujie Wang (now Google)