Publications

TWIX: Automatically Reconstructing Structured Data from Templatized Documents.

TWIX: Automatically Reconstructing Structured Data from Templatized Documents.

Yiming Lin, Mawil Hasan, Rohan Kosalge, Alvin Cheung, Aditya G. Parameswaran

arXiv

Flo: A Semantic Foundation for Progressive Stream Processing.

Flo: A Semantic Foundation for Progressive Stream Processing.

Shadaj Laddad, Alvin Cheung, Joseph M. Hellerstein, Mae Milano

POPL 2025

Flo: a Semantic Foundation for Progressive Stream Processing.

Flo: a Semantic Foundation for Progressive Stream Processing.

Shadaj Laddad, Alvin Cheung, Joseph M. Hellerstein, Mae Milano

arXiv

We Have No Idea How Models will Behave in Production until Production: How Engineers Operationalize Machine Learning.

We Have No Idea How Models will Behave in Production until Production: How Engineers Operationalize Machine Learning.

Shreya Shankar, Rolando Garcia, Joseph M. Hellerstein, Aditya G. Parameswaran

CSCW 2024

Inferring Visualization Intent from Conversation.

Inferring Visualization Intent from Conversation.

Haotian Li, Nithin Chalapathi, Huamin Qu, Alvin Cheung, Aditya G. Parameswaran

CIKM 2024

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing.

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing.

Shreya Shankar, Aditya G. Parameswaran, Eugene Wu

arXiv

Quilt: Custom UIs for Linking Unstructured Documents to Structured Datasets.

Quilt: Custom UIs for Linking Unstructured Documents to Structured Datasets.

Pragya Kallanagoudar, Chithra Anand, Rolando Garcia, Rebecca M. M. Hicke, Aditya G. Parameswaran, Eunice Jun, Sarah E. Chasins

UIST 2024

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences.

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences.

Shreya Shankar, J. D. Zamfirescu-Pereira, Bjoern Hartmann, Aditya G. Parameswaran, Ian Arawjo

UIST 2024

NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval.

NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval.

Sepanta Zeighami, Zac Wellmer, Aditya G. Parameswaran

arXiv

SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines.

SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines.

Shreya Shankar, Haotian Li, Parth Asawa, Madelon Hulsebos, Yiming Lin, J. D. Zamfirscu-Pereira, Harrison Chase, Will Fu-Hinthorn, Aditya G. Parameswaran, Eugene Wu

VLDB 2024

Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching.

Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching.

Joshua Wu, Dixin Tang, Nithin V. Chalapathi, Tristan Chambers, Julie Ciccolini, Cheryl Phillips, Lisa Pickoff-White, Aditya G. Parameswaran

VLDB 2024

Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle.

Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle.

Rolando Garcia, Pragya Kallanagoudar, Chithra Anand, Sarah E. Chasins, Joseph M. Hellerstein, Aditya G. Parameswaran

arXiv

Suki: Choreographed Distributed Dataflow in Rust.

Suki: Choreographed Distributed Dataflow in Rust.

Shadaj Laddad, Alvin Cheung, Joseph M. Hellerstein

arXiv

Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments.

Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments.

Gabriel Matute, Wode Ni, Titus Barik, Alvin Cheung, Sarah E. Chasins

PLDI 2024

Equivalence by Canonicalization for Synthesis-Backed Refactoring.

Equivalence by Canonicalization for Synthesis-Backed Refactoring.

Justin Lubin, Jeremy Ferguson, Kevin Ye, Jacob Yim, Sarah E. Chasins

PLDI 2024

It Took Longer than I was Expecting: Why is Dataset Search Still so Hard?

It Took Longer than I was Expecting: Why is Dataset Search Still so Hard?

Madelon Hulsebos, Wenjing Lin, Shreya Shankar, Aditya G. Parameswaran

SIGMOD 2024

Building Reactive Large Language Model Pipelines with Motion.

Building Reactive Large Language Model Pipelines with Motion.

Shreya Shankar, Aditya G. Parameswaran

SIGMOD 2024

Low-resourced Languages and Online Knowledge Repositories: A Need-Finding Study.

Low-resourced Languages and Online Knowledge Repositories: A Need-Finding Study.

Hellina Hailu Nigatu, John F. Canny, Sarah E. Chasins

arXiv

Learning to Restructure Tables Automatically.

Learning to Restructure Tables Automatically.

Joseph M. Hellerstein

SIGMOD 2024

Low-Resourced Languages and Online Knowledge Repositories: A Need-Finding Study.

Low-Resourced Languages and Online Knowledge Repositories: A Need-Finding Study.

Hellina Hailu Nigatu, John F. Canny, Sarah E. Chasins

CHI 2024

Towards Accurate and Efficient Document Analytics with Large Language Models.

Towards Accurate and Efficient Document Analytics with Large Language Models.

Yiming Lin, Madelon Hulsebos, Ruiying Ma, Shreya Shankar, Sepanta Zeighami, Aditya G. Parameswaran, Eugene Wu

arXiv

Bigger, not Badder: Safely Scaling BFT Protocols.

Bigger, not Badder: Safely Scaling BFT Protocols.

David C. Y. Chu, Chris Liu, Natacha Crooks, Joseph M. Hellerstein, Heidi Howard

EuroSys 2024

Wrapping Rings in Lattices: An Algebraic Symbiosis of Incremental View Maintenance and Eventual Consistency.

Wrapping Rings in Lattices: An Algebraic Symbiosis of Incremental View Maintenance and Eventual Consistency.

Conor Power, Saikrishna Achalla, Ryan Cottone, Nathaniel Macasaet, Joseph M. Hellerstein

EuroSys 2024

Optimizing Distributed Protocols with Query Rewrites.

Optimizing Distributed Protocols with Query Rewrites.

David C. Y. Chu, Rithvik Panchapakesan, Shadaj Laddad, Lucky E. Katahanas, Chris Liu, Kaushik Shivakumar, Natacha Crooks, Joseph M. Hellerstein, Heidi Howard

SIGMOD 2024

Revisiting Prompt Engineering via Declarative Crowdsourcing.

Revisiting Prompt Engineering via Declarative Crowdsourcing.

Aditya G. Parameswaran, Shreya Shankar, Parth Asawa, Naman Jain, Yujie Wang

CIDR 2024

SPADE: Synthesizing Assertions for Large Language Model Pipelines.

SPADE: Synthesizing Assertions for Large Language Model Pipelines.

Shreya Shankar, Haotian Li, Parth Asawa, Madelon Hulsebos, Yiming Lin, J. D. Zamfirescu-Pereira, Harrison Chase, Will Fu-Hinthorn, Aditya G. Parameswaran, Eugene Wu

arXiv

Optimizing Distributed Protocols with Query Rewrites [Technical Report].

Optimizing Distributed Protocols with Query Rewrites [Technical Report].

David Chu, Rithvik Panchapakesan, Shadaj Laddad, Lucky Katahanas, Chris Liu, Kaushik Shivakumar, Natacha Crooks, Joseph M. Hellerstein, Heidi Howard

arXiv

A Need Finding Study with Low-Resource Language Content Creators.

A Need Finding Study with Low-Resource Language Content Creators.

Hellina Hailu Nigatu, John F. Canny, Sarah E. Chasins

AfriCHI 2023

Automatic and Precise Data Validation for Machine Learning.

Automatic and Precise Data Validation for Machine Learning.

Shreya Shankar, Labib Fawaz, Karl Gyllstrom, Aditya G. Parameswaran

CIKM 2023

How Domain Experts Use an Embedded DSL.

How Domain Experts Use an Embedded DSL.

Lisa Rennels, Sarah E. Chasins

SPLASH 2023

Multiversion Hindsight Logging for Continuous Training.

Multiversion Hindsight Logging for Continuous Training.

Rolando Garcia, Anusha Dandamudi, Gabriel Matute, Lehan Wan, Joseph Gonzalez, Joseph M. Hellerstein, Koushik Sen

arXiv

Visualizing Spreadsheet Formula Graphs Compactly.

Visualizing Spreadsheet Formula Graphs Compactly.

Fanchao Chen, Dixin Tang, Haotian Li, Aditya G. Parameswaran

VLDB 2023

Transactional Panorama: A Conceptual Framework for User Perception in Analytical Visual Interfaces.

Transactional Panorama: A Conceptual Framework for User Perception in Analytical Visual Interfaces.

Dixin Tang, Alan D. Fekete, Indranil Gupta, Aditya G. Parameswaran

VLDB 2023

Bolt-on, Compact, and Rapid Program Slicing for Notebooks

Bolt-on, Compact, and Rapid Program Slicing for Notebooks

Shreya Shankar, Stephen Macke, Sarah E. Chasins, Andrew Head, and Aditya G. Parameswaran

VLDB 2023

Towards Observability for Production Machine Learning Pipelines

Towards Observability for Production Machine Learning Pipelines

Shreya Shankar and Aditya G. Parameswaran

VLDB 2023

Invited Paper: Initial Steps Toward a Compiler for Distributed Programs.

Invited Paper: Initial Steps Toward a Compiler for Distributed Programs.

Joseph M. Hellerstein, Shadaj Laddad, Mae Milano, Conor Power, Mingwei Samuel

PODC 2023

Take Out the TraChe: Maximizing (Tra)nsactional Ca(che) Hit Rate.

Take Out the TraChe: Maximizing (Tra)nsactional Ca(che) Hit Rate.

Audrey Cheng, David C. Y. Chu, Terrance Li, Jason Chan, Natacha Crooks, Joseph M. Hellerstein, Ion Stoica, Xiangyao Yu

OSDI 2023

Optimizing Stateful Dataflow with Local Rewrites.

Optimizing Stateful Dataflow with Local Rewrites.

Shadaj Laddad, Conor Power, Tyler Hou, Alvin Cheung, Joseph M. Hellerstein

arXiv

Co-Designing for Transparency: Lessons from Building a Document Organization Tool in the Criminal Justice Domain.

Co-Designing for Transparency: Lessons from Building a Document Organization Tool in the Criminal Justice Domain.

Hellina Hailu Nigatu, Lisa Pickoff-White, John F. Canny, Sarah E. Chasins

FAccT 2023

A Need-Finding Study with Users of Geospatial Data

A Need-Finding Study with Users of Geospatial Data

Parker Ziegler and Sarah E. Chasins

CHI 2023

Understanding Version Control as Material Interaction with Quickpose

Understanding Version Control as Material Interaction with Quickpose

Eric Rawn, Jingyi Li, Eric Paulos, and Sarah Chasins

CHI 2023

Efficient and Compact Spreadsheet Formula Graphs.

Efficient and Compact Spreadsheet Formula Graphs.

Dixin Tang, Fanchao Chen, Christopher De Leon, Tana Wattanawaroon, Jeaseok Yun, Srinivasan Seshadri, Aditya G. Parameswaran

ICDE 2023

Moving Fast With Broken Data.

Moving Fast With Broken Data.

Shreya Shankar, Labib Fawaz, Karl Gyllstrom, Aditya G. Parameswaran

arXiv

Exploring the Learnability of Program Synthesizers by Novice Programmers

Exploring the Learnability of Program Synthesizers by Novice Programmers

Dhanya Jayagopal, Justin Lubin, and Sarah E. Chasins

UIST 2022

Informing Housing Policy through Web Automation: Lessons for Designing Programming Tools for Domain Experts

Informing Housing Policy through Web Automation: Lessons for Designing Programming Tools for Domain Experts

Chris Hess and Sarah E. Chasins

CHI 2022

VizSmith: Automated Visualization Synthesis by Mining Data-Science Notebooks

VizSmith: Automated Visualization Synthesis by Mining Data-Science Notebooks

Rohan Bavishi, Shadaj Laddad, Hiroaki Yoshida, Mukul R. Prasad, Koushik Sen

ASE 2021

Gauss: Program Synthesis by Reasoning over Graphs

Gauss: Program Synthesis by Reasoning over Graphs

Rohan Bavishi, Caroline Lemieux, Koushik Sen, Ion Stoica

OOPSLA 2021

Keep It Simple: Unsupervised Simplification of Multi-Paragraph Text

Keep It Simple: Unsupervised Simplification of Multi-Paragraph Text

Philippe Laban, Tobias Schnabel, Paul Bennett, Marti A. Hearst

ACL 2021

Lux: always-on visualization recommendations for exploratory dataframe workflows

Lux: always-on visualization recommendations for exploratory dataframe workflows

Doris Jung-Lin Lee, Dixin Tang, Kunal Agarwal, Thyne Boonmark, Caitlyn Chen, Jake Kang, Ujjaini Mukhopadhyay, Jerry Song, Micah Yong, Marti A. Hearst, Aditya G. Parameswaran

VLDB 2022

Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows

Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows

Doris Xin, Eva Yiwei Wu, Doris Jung-Lin Lee, Niloufar Salehi, Aditya Parameswaran

CHI 2021

B2: Bridging Code and Interactive Visualization in Computational Notebooks

B2: Bridging Code and Interactive Visualization in Computational Notebooks

Yifan Wu, Joseph M. Hellerstein, Arvind Satyanaran

UIST 2020

ShapeSearch: A Flexible and Efficient System for Shape-based Exploration of Trendlines

ShapeSearch: A Flexible and Efficient System for Shape-based Exploration of Trendlines

Tarique Siddiqui, Paul Luh, Zesheng Wang, Karrie Karahalios, Aditya Parameswaran

SIGMOD 2020

SCRAM: Simple Checks for Real time Analysis of Model Training for Non-Expert ML Programmers

SCRAM: Simple Checks for Real time Analysis of Model Training for Non-Expert ML Programmers

Eldon Schoop, Forrest Huang, Björn Hartmann

CHI 2020

NBDT: Neural-Backed Decision Tree

NBDT: Neural-Backed Decision Tree

Alvin Wan, Lisa Dunlap, Daniel Ho, Jihan Yin, Scott Lee, Suzanne Petryk, Sarah Adel Bargal, Joseph E. Gonzalez

ICLR 2021

AutoPandas: Neural-Backed Generators for Program Synthesis

AutoPandas: Neural-Backed Generators for Program Synthesis

Rohan Bavishi, Caroline Lemieux, Roy Fox, Koushik Sen, Ion Stoica

OOPSLA 2019

Rousillon: Scraping Distributed Hierarchical Web Data

Rousillon: Scraping Distributed Hierarchical Web Data

Sarah E. Chasins, Maria Mueller, Rastislav Bodik

UIST 2018

Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices

Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices

Sara Alspaugh, Nava Zokaei, Andrea Liu, Cindy Jin, Marti A. Hearst

TVCG 2019

Learning Syntactic Program Transformations from Examples

Learning Syntactic Program Transformations from Examples

Reudismam Rolim, Gustavo Soares, Loris D'Antoni, Oleksandr Polozov, Sumit Gulwani, Rohan Gheyi, Ryo Suzuki, Björn Hartmann

ICSE 2017

Predictive Interaction for Data Transformation

Predictive Interaction for Data Transformation

Jeffrey Heer, Joseph M. Hellerstein, Sean Kandel

CIDR 2015