* Leveling up journalism with data science, Cheryl Phillips, Stanford
Abstract: Machine-learning that identifies influence in The Supreme Court, building programs to identify problem doctors who are still practicing, building new methods to discover the patterns in police use of force cases. In this talk, Cheryl Phillips walks through some of ways journalism with impact is built on sophisticated data science and lays out the hardest technical challenges accountability and investigative journalists face now, including how to use generative AI in a way that produces reliable results, doesn’t break the bank and results in news stories with impact.
Abstract: Examples are foundational in helping data journalists author interactive graphics, whether by demonstrating challenging techniques or serving as building blocks for new design exploration. However, a key element of an example’s usefulness is the availability of its source code. If a data journalist wants to work from an “in-the-wild” example for which no source code is available, they have to resort to manual reverse engineering to produce an approximation of the original visualization. This is a time-consuming and error-prone process, erasing much of the original benefit of working from an example. In this talk, I’ll present our work on reviz, a compiler and accompanying Chrome extension that automatically generates parameterized data visualization programs from input SVG subtrees. I’ll walk through the reviz architecture from an end user’s perspective before diving deep into the internals of our reverse engineering and compilation processes.
Abstract: Increasingly, applications leverage mixed-modality data, and must jointly search over vector data as well as structured data. In recent work, proposed methods for this hybrid search setting either suffer from poor search performance or severely restrict the set of allowed search predicates. To address this, we present ACORN, an approach for performant hybrid search with arbitrary predicates. ACORN builds on HNSW, a state-of-the-art graph-based ANN (approximate nearest neighbor) index and implements the idea of predicate subgraph traversal to emulate a theoretically ideal search strategy. We show how to efficiently construct ACORN's graph index using a predicate-agnostic algorithm for selecting each neighbor list in the index. We systematically evaluate ACORN on four datasets through a series of experiments and micro-benchmarks. Our evaluation uses both prior benchmark datasets that constrain the query predicate set to be low-cardinality, as well as more complex text retrieval and multi-modal retrieval datasets with high-cardinality query-predicate sets, and a range of predicate operators. Results show that ACORN achieves state-of-the-art performance on all datasets, outperforming prior methods by 2-1,000x higher QPS at 0.9 Recall. On the prior benchmark datasets, ACORN consistently outperforms recent methods that build specialized indices for these constrained query workloads. In addition, ACORN enables a broad set of new application by serving a rich set of query semantics, while once again outperforming previous baselines.
Abstract Reasoning about vast amounts of visual data is predominantly a human-centered task, making it the bottleneck of many data science and machine learning pipelines. In this work we explore the problem of automatically describing differences between sets of images with natural language. Our proposed method ImDiff utilizes descriptive captioning and large language models to propose concepts which are more present in one set of images than the other, verifying these hypotheses are grounded in the images using CLIP. We develop a suite of quantitative benchmarks to asses the correctness and relevance of described differences, and show how ImDiff can assist in tasks such as bias discovery, summarizing model failures, analyzing trends across time, and evaluating generative models.
Abstract LLMs' impressive capabilities in synthesizing and explaining code offer opportunities—and challenges—for human interactions with programs. Programming languages offer, critically, an unambiguous, deterministic encoding of a computational process—but generally require development of a fully-formed and complete expression of that process, a challenging task even for experts. This requirement is especially challenging when the desired process is itself not fully understood or ambiguous. In these cases, LLMs can assist by filling in the gaps with “reasonable” defaults, enabling the realization and execution of underspecified programs, a form of “lazy binding” on programmer desire. This LLM-disambiguated programming process resembles “sketching” in the way it omits details and focuses on realizing preliminary, engaging the programmer’s “assessment” brain in evaluating what the sketch reveals about a particular problem or approach. In this work we explore a programming “sketching” process comprising (1) ambiguous natural language descriptions of intent, (2) LLM-based code synthesis, and (3) natural language iteration on synthesized code and output.
Abstract Syntactic analysis tools like Semgrep and Comby leverage the structure in code, making them more expressive than traditional string and regex search. Meanwhile they also use a lightweight specification, similar to the target language syntax, aiming to be a simpler than traditional heavyweight language frameworks that use syntax tree manipulation (e.g., ESLint). However, state-of-the-art matching techniques for these tools require queries to be complete, parsable programs, which makes partial or in-progress query specifications useless. We propose a new search architecture that relies only on tokenizing a query. We introduce a new language and develop a novel sequence-to-tree matching semantics and algorithm, building on tree automata, to support tree-aware wildcards on this architecture. In contrast to past work, our approach offers syntactic search even for incomplete code fragments. We implement these techniques in an open-source stsearch library. For partial but tokenizable queries, our tool supports 10x more partial queries than Semgrep, our baseline. Our work offers evidence that syntactic code search can be upgraded to accept in-progress, non-parsable specifications, potentially improving support for interactive settings
Abstract: Machine learning models fail in unpredictable ways and many produce outputs that are difficult for users to verify, such as machine translation and code generation. Providing guidance on when to rely on a system is challenging because these models can generate a wide range of outputs (e.g. text), error boundaries are highly stochastic, and automated explanations may be incorrect. I will discuss this problem in the healthcare context where models trained on past data can be incredibly useful, but also challenging to use reliably. For instance, healthcare providers increasingly use machine translation (MT) for patients who do not speak the dominant language. However, MT systems can produce inaccurate translations.My work develops approaches to improve the reliability of ML models by designing actionable strategies for a user to gauge reliability and recover from potential errors.
Abstract: Computational notebooks are commonly used for iterative workflows, such as in exploratory data analysis. This process lends itself to the accumulation of old code and hidden state, making it hard for users to reason about the lineage of, e.g., plots depicting insights or trained machine learning models. One way to reason about code used to generate various notebook data artifacts is to compute a program slice, but traditional static approaches to slicing can be both inaccurate (failing to contain relevant code for artifacts) and conservative (containing unnecessary code for an artifacts). We present nbslicer, a dynamic slicer optimized for the notebook setting whose instrumentation for resolving dynamic data dependencies is both bolt-on (and therefore portable) and switchable (allowing it to be selectively disabled in order to reduce instrumentation overhead). We demonstrate nbslicer’s ability to construct small and accurate backward slices (i.e., historical cell dependencies) and forward slices (i.e., cells affected by the “rerun” of an earlier cell), thereby improving reproducibility in notebooks and enabling faster reactive re-execution, respectively. Comparing nbslicer with a static slicer on 374 real notebook sessions, we found that nbslicer filters out far more superfluous program statements while maintaining slice correctness, giving slices that are, on average, 66% and 54% smaller for backward and forward slices, respectively.
Abstract: Programming tools are increasingly integral to research and analysis in myriad domains, including specialized areas with no formal relation to computer science. Embedded domain-specific languages (eDSLs) have the potential to serve these programmers while placing relatively light implementation burdens on language designers. However, barriers to eDSL use reduce their practical value and adoption. In this paper, we aim to deepen our understanding of how programmers use eDSLs and identify user needs to inform future eDSL designs. We performed a contextual inquiry (9 participants) with domain experts using Mimi, an eDSL for climate change economics modeling. A thematic analysis identified five key themes, including: the interaction between the eDSL and the host language has significant and sometimes unexpected impacts on eDSL user experience, and users preferentially engage with domain-specific communities and code templates rather than host language resources. The needs uncovered in our study offer design considerations for future eDSLs and suggest directions for future DSL usability research.
Abstract: Prior work shows that readers prefer charts that include annotations and explores how these text elements affect topic recall and conclusions. However, there's limited understanding of how text impacts different aspects of interpreting data visualizations. Our research explores the role of text in influencing both future outcome predictions and judgments of perceived bias in visualizations. Results indicate that while text only subtly alters how viewers predict data trends, it plays a profound role in shaping perceptions of author bias. Exploratory analyses support an interaction between a person's prediction and the degree of bias they perceived, demonstrating an effect similar to polarization. These findings underscore the need for careful text selection in visualizations to reduce unintended bias and suggest the importance of studying the various ways readers could interpret these visualizations.