🚀 Exciting News (September 2024): We've released DocETL, a system for LLM-powered document processing! Check it out at docetl.org. Looking for others to join the project (e.g., undergrads and grad students at Berkeley); reach out if interested!

About Me

I'm Shreya Shankar, a computer scientist in the Bay Area. I am completing my PhD in reliable AI-powered unstructured data processing, advised by Dr. Aditya Parameswaran. This is a nice blend of my interests in MLOps, data management, and human-computer interaction. I am grateful to be supported by the NDSEG Fellowship. I am studying at UC Berkeley. Go Bears! 🐻

I also consult on ML engineering and production AI strategy for enterprises. Prior to my PhD, I was the first ML engineer at a startup, did research engineering at Google Brain, and engineering at Facebook. Before all of that, I did my BS and MS in computer science at Stanford. Go Trees! 🌲

👉 Click to show/hide full bio for speaking engagements

📝 Bio (for speaking engagements, etc.)

Shreya Shankar is a PhD student in computer science at UC Berkeley, advised by Dr. Aditya Parameswaran. Her research addresses data challenges in production ML pipelines through a human-centered lens, focusing on data quality, observability, and more recently, leveraging large language models for data preprocessing. Shreya's work has appeared in top data management and HCI venues, including SIGMOD, VLDB, CIDR, CSCW, and UIST. She is a recipient of the NDSEG Fellowship and co-organizes the DEEM workshop at SIGMOD, which focuses on data management in end-to-end machine learning. Prior to her PhD, Shreya worked as an ML engineer and completed her undergraduate degree in computer science at Stanford University. In her free time, she enjoys roasting coffee and is actively trying to reduce her Twitter usage.

📰 News and Industry Impact

Recent News

  • [Nov 2024] New user interface out for DocETL! Try our interactive playground or watch our tutorial video!
  • [Oct 2024] New preprint out for DocETL, on our agentic query optimizer! Read it here.
  • [Apr 2024] Our paper "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning got into CSCW 2024! Presenting in November. 🇨🇷 (paper)

Companies That Like Our Work 👍

👨‍🏫 Mentorship

I am fortunate to work with many talented students at UC Berkeley. Below is a list of students I am currently mentoring or have mentored for a year or more.

Current Students

  • Reya Vir (UC Berkeley undergraduate) - Working on a benchmark for synthesizing data quality constraints for LLM applications.
  • Quentin Romero Lauro (University of Pittsburgh undergraduate, REU at UC Berkeley) - Developing interfaces for iterating on retrieval-augmented generation (RAG) architectures for LLM applications.
  • Rachel Lin (UC Berkeley master's student) - Developing interfaces for iterative dataset search with LLMs; co-mentored with Madelon Hulsebos.

Past Students

  • Parth Asawa (former UC Berkeley undergraduate) - Worked on data quality constraints for LLM applications and declarative LLM workflows. Now pursuing a PhD at UC Berkeley.
  • Yujie Wang (former UC Berkeley undergraduate) - Worked on monitoring ML performance metrics without ground-truth labels. Now at Google.
  • Aditi Mahajan (former UC Berkeley undergraduate) - Worked on unit tests for end-to-end ML pipelines. Now at Google.

🗣️ Selected Invited Talks

Upcoming

  • [Nov 2024] Snowflake
  • [Nov 2024] ByteDance (TikTok)
  • [Nov 2024] Google
  • [Nov 2024] WInE Lab at CMU
  • [Nov 2024] Solventum

Past (Recordings)