Enabling API Upgrades
Upgrading packages with
API changes can be prohibitively expensive. Existing techniques require manual, expert effort and have limited applicability. Software ecosystems present complex social challenges that tools must deal with, including scale, mismatched priorities, heterogeneous levels and areas of expertise, and limits on communication and collaboration.
We have two ongoing projects to extend program transformation tooling. 1. Ensuring type preservation when applying source-to-source rewrite rules, to increase reliability for all potential users. 2. Semantic-based synthesis of rewrite rules, to extend existing approaches to be easily used by everyone.
Minimizing Machine Learning in Video Data Analytics with Geo-spatial Metadata
Video data exploration tasks involve querying over objects and their movements in the videos. Many video data exploration tools use machine-learning-based algorithms to detect and tracks objects. However, these algorithms are slow, which can interrupt the flow of the data scientists/journalists train of thoughts when exploring the video data. In prior works, MIRIS tracks objects in the videos at lower frame rates. If the system finds a tracking in any of the sections of the videos, it tracks objects in the videos at a higher frame rates until it can track the object at a certain confidence. With this approach, the system does not have to run the object-tracking algorithms on all of the video frames.
In autonomous driving video data or surveillance camera video, we usually know the location of the camera and the information of the environment around the camera. In this project, we use the geo-spatial metadata to help the systems filter out the irrelevant frames based on users’ queries before feeding them into object-tracking algorithms.
A Conversational Interface for Automatic Visualization
Generating visualizations is a key step in exploratory data analysis but can be time-consuming and complicated in no-code environments. Visualizations are also not static; as more information is discovered through exploratory data analysis, new visualizations need to be built to answer new questions. We introduce a conversational natural language interface for creating visualizations from data. Our approach is the first to use large language modeling for generating visualizations in a conversational setting.
Iterative Design of Semantic Grouping Guidelines and Metrics for Mobile User Interfaces
While prior research on widget grouping in mobile user interface (UI) design has focused on visual grouping, little work has been devoted to the semantic coherence of such groupings, which affects user understanding of the interface. We propose five design guidelines that are generally applicable for semantic element grouping in mobile UIs. We generated the guidelines through an iterative process: they were first conceived through empirical observations of existing mobile UIs and a literature review, refined through multiple rounds of feedback from UI design experts, and finally evaluated with an expert review. The feedback from experts indicate a strong need for these guidelines, as the design and evaluation of semantic grouping is currently conducted based on intuition. In addition to being a useful resource for UI design, these guidelines could lead to computational methods to evaluate interfaces. We experimented with computational metrics built from these guidelines that show promising results.
A Cross-Domain Need-Finding Study with Users of Geospatial Data
Geospatial data—such as multispectral satellite imagery, geographically-enriched demographic data, and crowdsourced datasets like OpenStreetMap—is more available today than ever before. This data is playing an increasingly critical role in the work of Earth and climate scientists, social scientists, and data journalists exploring spatiotemporal change in our environment and societies. However, existing software and programming tools for geospatial analysis and visualization are challenging to learn and difficult to use. Many domain experts are unfamiliar with both the theory of geospatial data and the specialized Geographic Information System (GIS) software used to work with such data. While libraries for geospatial analysis and visualization are increasingly common in Python, R, and JavaScript, they still require proficiency with at least one of these programming languages in addition to geospatial data theory. In short, domain experts face steep challenges in gathering, transforming, analyzing, and visualizing geospatial data.
The aim of this research is to investigate the specific computing needs of the diversifying community of geospatial data users. This poster will present findings from a contextual inquiry study (n = 25) with Earth and climate scientists, social scientists, and data journalists using geospatial data in their current work. We will focus on key challenges identified in our thematic analysis, including (1) finding and transforming geospatial data to satisfy spatiotemporal constraints, (2) understanding the behavior of geospatial operators, (3) tracking geospatial data provenance, and (4) efficiently exploring the cartographic design space. We will also discuss the design opportunities these findings suggest for new geospatial analysis and visualization systems.
Striking a Balance: Reader Takeaways and Preferences when Integrating Text and Charts
Visualizations frequently use text to guide and inform readers. Prior work in visualization research indicates that text has an influence on reader conclusions, but there is little empirical evidence supporting the best wait to integrate text and charts. Designers lack guidance around the proper amount of text to show, what content to use, and where to position it. Furthermore, personal preferences vary in regards to visual and textual representations.
In this study, we explored several research questions about the textual components of visualizations. 302 participants viewed univariate line charts with differing amounts of text. This text varied in content and position. Participants ranked charts according to preference, with stimuli ranging from charts with no text except axes labels to a full written paragraph. They also provided their conclusions from charts with only one or two pieces of text with varying content and position. From these responses, we found that participants prefer charts with a greater amount of text in comparison to charts with fewer pieces of text or text alone. We also found that the content of the text affects reader conclusions. For example, text that describes statistical or relational components of a chart leads to more takeaways referring to statistics or relational comparisons than text describing chart elements. Additionally, the effect of certain content depended on the placement of the text on the chart. Some content is best placed in the title, while other content should be placed close to the data. We compiled these results into four visualization design guidelines.
Data cleaning for acronyms, abbreviations, and typos derived from manual entry
In many no-code data tools, such as spreadsheets, users often manually fill values into cells. In this process, even values that refer to the same underlying concept can take on many forms, thanks to users introducing acronyms, abbreviations, and typos. Collapsing these values down to a canonical set for the purpose of data cleaning is a challenge. For example, public defender units we work with took multiple weeks to manually collapse the values (for columns such as police title or command) to a smaller canonical set. There is a need for an automated way to deal with acronyms, abbreviations, and typos, specifically a new metric that helps map values that refer to the same underlying concept to each other, taking into account acronyms, abbreviations, and typos. We also wanted to develop an efficient way to employ this metric to collapse the values down to a canonical set. We developed a new distance metric that preserves the “key” structures of a value - allowing values that refer to the same concept to be mapped together. For example, “School Resource Officer” would map to both “Sc Rs Off”, “SRO”, as well as “Scres off”. We further developed a dynamic programming algorithm that efficiently comes up with the score for two values, along with ways to prune poor matches without complete evaluation. We embedded our approach into the popular open source data cleaning tool OpenRefine, and demonstrated substantial improvements relative to the state-of-the-art.
Transactional Panorama: A Conceptual Framework for User Perception in Analytical Visual Interfaces
Many tools empower analysts and data scientists to consume analysis results in a visual interface such as a dashboard. When the underlying data changes, these results need to be updated, but this update can take a long time—all while the user continues to explore the results. In this context, tools can either (i) hide away results that haven’t been updated, hindering exploration; (ii) make the updated results immediately available to the user (on the same screen as old results), leading to confusion and incorrect insights; or (iii) present old—and therefore stale—results to the user during the update. To help users reason about these options and others, and make appropriate trade-offs, we introduce Transactional Panorama, a formal framework that adopts transactions to jointly model the system refreshing the analysis results and the user interacting with them. We introduce three key properties that are important for user perception in this context: visibility (allowing users to continuously explore results), consistency (ensuring that results presented are from the same version of the data), and monotonicity (making sure that results don’t “go back in time”). Within transactional panorama, we characterize all feasible property combinations, design new mechanisms (that we call lenses) for presenting analysis results to the user while preserving a given property combination, formally prove their relative orderings for various performance criteria, and discuss their use cases. We propose novel algorithms to preserve each property combination and efficiently present fresh analysis results. We implement our framework into a popular, open-source Business Intelligence (BI) tool, illustrate the relative performance implications of different lenses, and demonstrate the benefits of the novel lenses and our optimizations.
Towards End-User Prompt Engineering: Lessons from an LLM-based Chatbot Design Tool
A large body of prior work has examined the capabilities of pre-trained language models (“LLMs”) such as GPT-3; in contrast, relatively little work has explored how humans are able to make use of those capabilities. Using natural language to steer LLM outputs (“prompting”) is emerging as an important design technique—but prompt-based systems comply inconsistently, and users face challenges systematically understanding how a prompt change might impact subsequent LLM outputs. The apparent ease of instruction via prompts has led to an explosion of interest in tools that enable end-users to engage with computational systems using natural language prompts. To explore how these non-expert users approach “end-user prompt engineering,” we conduct a design probe with a prototype LLM-based chatbot design tool that encourages iterative development of prompting strategies, and report briefly on findings here.
Discovering structure in Messy Data
Interactive Multi-Modal Data Wrangling
PACER Court Case Extraction