🚀 Exciting News (September 2024): We've released 📜 DocETL, a system for LLM-powered document processing! Looking for others to join the project (e.g., undergrads and grad students at Berkeley); reach out if interested!

About Me

Shreya Shankar

My dog Papaya 🐕 and me on a hike 🥾

I'm Shreya Shankar, a fourth-year PhD student at UC Berkeley in the EECS department. I'm advised by Dr. Aditya Parameswaran and supported by the NDSEG Fellowship. Go Bears! 🐻

Prior to my PhD, I worked as an ML engineer in industry. I completed my BS and MS in computer science at Stanford. Go Trees! 🌲

Shreya Shankar

My dog Papaya 🐕 and me on a hike 🥾

🔬 Research Interests

I build systems that help people use AI to work with data effectively. I am also interested in no-code and low-code data tools, enabling people of all technical backgrounds to use AI for complex tasks. I am fortunate that several of my research projects have been deployed in production at major tech companies and startups.

📊 LLM-Powered Data Processing

Building reliable systems and interfaces for LLM applications. Recent work: DocETL for document processing, declarative prompt engineering, and reactive LLM pipelines.

👥 Human-AI Interaction

Creating intuitive interfaces for AI systems. Recent work on aligning LLM evaluators with human preferences and studying how humans interact with AI systems.

🛡️ Data Quality & Validation

Developing tools for reliable AI pipelines. Latest work: SPADE for LLM data quality, automatic data validation for ML, and SCIPE for debugging LLM chains.

MLOps & Monitoring

Designing systems for machine learning in production. Key papers on operationalizing ML and ML observability.

👉 Click to show/hide full bio for speaking engagements

📝 Bio (for speaking engagements, etc.)

Shreya Shankar is a PhD student in computer science at UC Berkeley, working on data management for machine learning and AI. Her research creates practical tools and frameworks that help people build reliable ML systems, with recent work on declarative interfaces and optimization for complex unstructured data analysis.

Shreya is advised by Dr. Aditya Parameswaran. Her work appears in top data management and HCI venues like SIGMOD, VLDB, CIDR, CSCW and UIST, and she co-organizes the DEEM workshop at SIGMOD. She is supported by the NDSEG Fellowship. Prior to Berkeley, she worked as an ML engineer after completing her B.S. in computer science at Stanford University. In her free time, she enjoys roasting coffee and is actively trying to reduce her Twitter usage.

📰 News and Industry Impact

Recent News

  • [Jan 2025] We released DocWrangler, an IDE for writing DocETL pipelines! Read more about it in our blog post and access DocWrangler here.
  • [Oct 2024] New preprint out for DocETL, on our agentic query optimizer! Read it here.

Companies That Like Our Work 👍

👨‍🏫 Mentorship

I am fortunate to work with many talented students at UC Berkeley. Below is a list of students I am currently mentoring or have mentored for a year or more.

Current Students

  • Reya Vir (UC Berkeley undergraduate) - Working on a benchmark for synthesizing data quality constraints for LLM applications. Paper received a 4 in review through ARR; will be submitted to NAACL or ACL 2025.
  • Quentin Romero Lauro (University of Pittsburgh undergraduate, REU at UC Berkeley) - Developing interfaces for iterating on retrieval-augmented generation (RAG) architectures for LLM applications. Paper in progress.
  • Ankush Garg (UC Berkeley master's student) - Building SCIPE, a debugging tool for complex chains and graphs of LLM calls. Deployed SCIPE with LangChain.
  • Rachel Lin (UC Berkeley master's student) - Developing interfaces for iterative dataset search with LLMs; co-mentored with Madelon Hulsebos (former postdoc at UC Berkeley, now faculty at CWI).

Past Students

  • Parth Asawa (former UC Berkeley undergraduate) - Worked on data quality constraints for LLM applications and declarative LLM workflows. Now pursuing a PhD at UC Berkeley.
  • Yujie Wang (former UC Berkeley undergraduate) - Worked on monitoring ML performance metrics without ground-truth labels. Joined Google after graduation.
  • Aditi Mahajan (former UC Berkeley undergraduate) - Worked on unit tests for end-to-end ML pipelines. Joined Google after graduation.

🗣️ Selected Invited Talks

DocETL and Agentic Data Systems

  • [Feb '25] Scottish Climate Intelligence Service
  • [Jan '25] Cloudera
  • [Jan '25] Columbia University
  • [Dec '24] Microsoft: Gray Systems Lab
  • [Nov '24] Snowflake
  • [Nov '24] ByteDance (TikTok)
  • [Nov '24] Google: Systems Research Group
  • [Nov '24] WInE Lab at CMU
  • [Nov '24] Solventum
  • [Oct '24] US Army Research Laboratory

Some Past Recordings

📬 Contact

Email: shreyashankar@berkeley.edu
Twitter | Github

Download CV (PDF)