📢 I am on the academic job market this year (2025-2026 cycle).

📢 I am on the academic job market this year (2025-2026 cycle).

I am a final-year PhD student in the Data Systems and Foundations group at UC Berkeley, advised by Aditya Parameswaran. My research is supported by an NDSEG Fellowship, copious compute credits from Modal Labs, and sponsors of the EPIC Lab. Before my PhD, I worked as a machine learning and data engineer at a couple of startups. Before that, I did my undergrad in computer science at Stanford University.

Research

I build reliable and efficient AI-powered data systems. I develop query optimization algorithms that make AI-powered data analysis both accurate and affordable, as well as interfaces that help users specify their intent and trust the results. I also care deeply about impact in data‑intensive, non‑CS domains.

Some successful projects in my PhD include:

  • DocETL (VLDB 2025) is a declarative, open‑source system for LLM‑powered data processing. DocETL introduces agentic query optimization — using LLM agents to logically rewrite and automatically validate query plans, top‑down. DocETL is deployed in multiple real‑world settings, including several public defenders' offices in California and the Scottish Climate Intelligence Service.
  • DocWrangler (UIST 2025) is an IDE for specifying and debugging DocETL pipelines. As of August 2025, our online deployment has been used to create over 2,750 pipelines across healthcare, finance, customer support, and government agencies in the US and Asia.
  • SPADE + EvalGen (VLDB 2024; UIST 2024): SPADE automatically synthesizes validation criteria for LLM pipelines. EvalGen enables interactive labeling while validators are synthesized in the background. These systems have widely influenced evaluation tools, including those built by LangChain, Arize, and Chroma, and served as building blocks for the DocETL stack.

Teaching

I co‑teach AI Evals for Engineers with Hamel Husain, the most popular online course for LLM evaluation and testing for engineers and product leaders. I wrote the curriculum and course reader — see a table of contents preview. The course has enrolled 1,500+ participants from 500+ organizations and garnered over 300 testimonials. An O’Reilly book based on the course is slated for Summer 2026.

In the university setting, I have served as a TA at UC Berkeley and Stanford. At UC Berkeley, I TA’d Data Engineering (~500 students across offerings). At Stanford, I TA’d every quarter as an undergraduate; in my final year, I was Head TA for CS106B (Programming Abstractions).

Publications