This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
retreats:2022fall:abstracts [2022/11/03 15:10] kilov |
retreats:2022fall:abstracts [2022/11/03 15:37] (current) kilov |
||
---|---|---|---|
Line 1: | Line 1: | ||
=====Talk Abstracts: | =====Talk Abstracts: | ||
+ | \\ | ||
- | - **Trial by File Formats: Exploring Public Defenders’ Challenges Working with Novel Surveillance Data** | + | - **Trial by File Formats: Exploring Public Defenders’ Challenges Working with Novel Surveillance Data**\\ Abstract: In the United States, public defenders serve as an essential bulwark against wrongful arrest and incarceration for low-income and marginalized people. Public defenders have long been overworked and under-resourced. However, these issues have been compounded by increases in the volume and complexity of data in modern criminal cases. We explore the technology needs of public defenders through a series of semi-structured interviews with public defenders and those who work with them. We find that public defenders’ ability to reason about novel surveillance data is woefully inadequate not only due to a lack of resources and knowledge, but also due to the structure of the criminal justice system, which gives prosecutors and police more control than defense attorneys over the type of information used in criminal cases. We find that public defenders may be able to create fairer situations for their clients with better tools for data interpretation and access. Therefore, we call on technologists to attend to the needs of public defenders and the people they represent when designing systems that collect data about people. Our findings illuminate constraints that technologists and privacy advocates should consider as they pursue solutions.\\ \\ |
- | + | - **Document Organization Three Ways**\\ Despite advances in natural language processing, computer vision, and other techniques that simplify the processing of large, unstructured documents such as PDFs, present-day tools remain difficult to use. Many experts from non-technical domains continue to process large, messy document datasets manually, while others become self-taught programmers. For teams with limited time, budgets, and computing education, this is a heavy burden. Our study assesses the learnability of three categories of programming interaction for document processing: textual, visual, and programming-by-example. We conducted a counterbalanced within-subject study (n=12) in which participants used all three programming paradigms. Our qualitative analysis reveals patterns in their relative benefits, including how participants reported Visual programming paradigms gave them a broader understanding of their data. Our results suggest design opportunities for tools that aim to help domain experts complete programming tasks.\\ \\ | |
- | Abstract: In the United States, public defenders serve as an essential bulwark against wrongful arrest and incarceration for low-income and marginalized people. Public defenders have long been overworked and under-resourced. However, these issues have been compounded by increases in the volume and complexity of data in modern criminal cases. We explore the technology needs of public defenders through a series of semi-structured interviews with public defenders and those who work with them. We find that public defenders’ ability to reason about novel surveillance data is woefully inadequate not only due to a lack of resources and knowledge, but also due to the structure of the criminal justice system, which gives prosecutors and police more control than defense attorneys over the type of information used in criminal cases. We find that public defenders may be able to create fairer situations for their clients with better tools for data interpretation and access. Therefore, we call on technologists to attend to the needs of public defenders and the people they represent when designing systems that collect data about people. Our findings illuminate constraints that technologists and privacy advocates should consider as they pursue solutions. | + | - **Exploring the Learnability of Program Synthesizers by Novice Programmers**\\ Tools known as program synthesizers show promise to lighten the burden of programming by automatically writing code for users, but little research has addressed what contributes to and detracts from their learnability by novice programmers. For example: |
- | + | - Ordered List ItemHow | |
- | - **Document Organization Three Ways** | + | |
- | + | | |
- | Despite advances in natural language processing, computer vision, and other techniques that simplify the processing of large, unstructured documents such as PDFs, present-day tools remain difficult to use. Many experts from non-technical domains continue to process large, messy document datasets manually, while others become self-taught programmers. For teams with limited time, budgets, and computing education, this is a heavy burden. Our study assesses the learnability of three categories of programming interaction for document processing: textual, visual, and programming-by-example. We conducted a counterbalanced within-subject study (n=12) in which participants used all three programming paradigms. Our qualitative analysis reveals patterns in their relative benefits, including how participants reported Visual programming paradigms gave them a broader understanding of their data. Our results suggest design opportunities for tools that aim to help domain experts complete programming tasks. | + | |
- | + | - **Always-on Visualization Recommendations**\\ Exploratory data science largely happens in computational notebooks with dataframe APIs, such as pandas, that support flexible means to transform, clean, and analyze data. Yet, visually exploring data in dataframes remains tedious, requiring substantial programming effort for visualization and mental effort to determine what analysis to perform next. We propose Lux, an always-on framework for accelerating visual insight discovery in dataframe workflows. When a dataframe is printed, Lux recommends visualizations to provide a quick overview of the patterns and trends and suggest promising analysis directions. Users can tailor recommendations via a lightweight intent language. Lux also leverages scalable data computation techniques to generate recommendations quickly. Lux has been embraced by data science practitioners -- and especially by novice data scientists -- with over 400K downloads and 4.2k stars on Github.\\ \\ | |
- | - **Exploring the Learnability of Program Synthesizers by Novice Programmers** | + | - **Human-Centered Tools for Reliable Use of Machine Translation**\\ Although machine translation (MT) technology has been rapidly improving, actual user needs for these systems remain relatively poorly understood and, as a result, unmet. For example, current MT systems do not help users understand when they can rely on translations, |
- | + | - **Operationalizing Machine Learning: An Interview Study**\\ The process of operationalizing machine learning, i.e., deploying and sustaining ML models in real data-centric applications, | |
- | Tools known as program synthesizers show promise to lighten the burden of programming by automatically writing code for users, but little research has addressed what contributes to and detracts from their learnability by novice programmers. For example: | + | |
- | * How do synthesizers' | + | |
- | | + | |
- | | + | |
- | | + | |
- | From our analysis, we provide a set of design opportunities to inform the design of future program synthesizers. Our findings have ramifications for the use of program synthesis in data work. | + | |
- | + | ||
- | + | ||
- | - **Always-on Visualization Recommendations** | + | |
- | + | ||
- | Exploratory data science largely happens in computational notebooks with dataframe APIs, such as pandas, that support flexible means to transform, clean, and analyze data. Yet, visually exploring data in dataframes remains tedious, requiring substantial programming effort for visualization and mental effort to determine what analysis to perform next. We propose Lux, an always-on framework for accelerating visual insight discovery in dataframe workflows. When a dataframe is printed, Lux recommends visualizations to provide a quick overview of the patterns and trends and suggest promising analysis directions. Users can tailor recommendations via a lightweight intent language. Lux also leverages scalable data computation techniques to generate recommendations quickly. Lux has been embraced by data science practitioners -- and especially by novice data scientists -- with over 400K downloads and 4.2k stars on Github. | + | |
- | + | ||
- | - **Human-Centered Tools for Reliable Use of Machine Translation** | + | |
- | + | ||
- | Although machine translation (MT) technology has been rapidly improving, actual user needs for these systems remain relatively poorly understood and, as a result, unmet. For example, current MT systems do not help users understand when they can rely on translations, | + | |
- | + | ||
- | - **Operationalizing Machine Learning: An Interview Study** | + | |
- | + | ||
- | The process of operationalizing machine learning, i.e., deploying and sustaining ML models in real data-centric applications, | + | |
=====Poster Abstracts: | =====Poster Abstracts: | ||
- | + | \\ | |
- | - **Enabling API Upgrades** | + | - **Enabling API Upgrades**\\ Upgrading packages with API changes can be prohibitively expensive. Existing techniques require manual, expert effort and have limited applicability. Software ecosystems present complex social challenges that tools must deal with, including scale, mismatched priorities, heterogeneous levels and areas of expertise, and limits on communication and collaboration. |
- | + | - **Minimizing Machine Learning in Video Data Analytics with Geo-spatial Metadata**\\ Video data exploration tasks involve querying over objects and their movements in the videos. Many video data exploration tools use machine-learning-based algorithms to detect and tracks objects. However, these algorithms are slow, which can interrupt the flow of the data scientists/ | |
- | Upgrading packages with API changes can be prohibitively expensive. Existing techniques require manual, expert effort and have limited applicability. Software ecosystems present complex social challenges that tools must deal with, including scale, mismatched priorities, heterogeneous levels and areas of expertise, and limits on communication and collaboration. | + | - **A Conversational Interface for Automatic Visualization**\\ Generating visualizations is a key step in exploratory data analysis but can be time-consuming and complicated in no-code environments. Visualizations are also not static; as more information is discovered through exploratory data analysis, new visualizations need to be built to answer new questions. We introduce a conversational natural language interface for creating visualizations from data. Our approach is the first to use large language modeling for generating visualizations in a conversational setting.\\ \\ |
- | + | - **Iterative Design of Semantic Grouping Guidelines and Metrics for Mobile User Interfaces**\\ While prior research on widget grouping in mobile user interface (UI) design has focused on visual grouping, little work has been devoted to the semantic coherence of such groupings, which affects user understanding of the interface. We propose five design guidelines that are generally applicable for semantic element grouping in mobile UIs. We generated the guidelines through an iterative process: they were first conceived through empirical observations of existing mobile UIs and a literature review, refined through multiple rounds of feedback from UI design experts, and finally evaluated with an expert review. The feedback from experts indicate a strong need for these guidelines, as the design and evaluation of semantic grouping is currently conducted based on intuition. In addition to being a useful resource for UI design, these guidelines could lead to computational methods to evaluate interfaces. We experimented with computational metrics built from these guidelines that show promising results.\\ \\ | |
- | We have two ongoing projects to extend program transformation tooling. 1. Ensuring type preservation when applying source-to-source rewrite rules, to increase reliability for all potential users. 2. Semantic-based synthesis of rewrite rules, to extend existing approaches to be easily used by everyone. | + | - **A Cross-Domain Need-Finding Study with Users of Geospatial Data**\\ Geospatial data—such as multispectral satellite imagery, geographically-enriched demographic data, and crowdsourced datasets like OpenStreetMap—is more available today than ever before. This data is playing an increasingly critical role in the work of Earth and climate scientists, social scientists, and data journalists exploring spatiotemporal change in our environment and societies. However, existing software and programming tools for geospatial analysis and visualization are challenging to learn and difficult to use. Many domain experts are unfamiliar with both the theory of geospatial data and the specialized Geographic Information System (GIS) software used to work with such data. While libraries for geospatial analysis and visualization are increasingly common in Python, R, and JavaScript, they still require proficiency with at least one of these programming languages in addition to geospatial data theory. In short, domain experts face steep challenges in gathering, transforming, |
- | + | - **Striking a Balance: Reader Takeaways and Preferences when Integrating Text and Charts**\\ Visualizations frequently use text to guide and inform readers. Prior work in visualization research indicates that text has an influence on reader conclusions, | |
- | - **Minimizing Machine Learning in Video Data Analytics with Geo-spatial Metadata** | + | - **Data cleaning for acronyms, abbreviations, |
- | + | - **Transactional Panorama: A Conceptual Framework for User Perception in Analytical Visual Interfaces**\\ Many tools empower analysts and data scientists to consume analysis results in a visual interface such as a dashboard. When the underlying data changes, these results need to be updated, but this update can take a long time—all while the user continues to explore the results. In this context, tools can either (i) hide away results that haven’t been updated, hindering exploration; | |
- | Video data exploration tasks involve querying over objects and their movements in the videos. Many video data exploration tools use machine-learning-based algorithms to detect and tracks objects. However, these algorithms are slow, which can interrupt the flow of the data scientists/ | + | - **Towards End-User Prompt Engineering: |
- | + | - **Discovering structure in Messy Data**\\ \\ | |
- | In autonomous driving video data or surveillance camera video, we usually know the location of the camera and the information of the environment around the camera. In this project, we use the geo-spatial metadata to help the systems filter out the irrelevant frames based on users’ queries before feeding them into object-tracking algorithms. | + | - **Interactive Multi-Modal Data Wrangling**\\ \\ |
- | + | ||
- | + | ||
- | - **A Conversational Interface for Automatic Visualization** | + | |
- | + | ||
- | Generating visualizations is a key step in exploratory data analysis but can be time-consuming and complicated in no-code environments. Visualizations are also not static; as more information is discovered through exploratory data analysis, new visualizations need to be built to answer new questions. We introduce a conversational natural language interface for creating visualizations from data. Our approach is the first to use large language modeling for generating visualizations in a conversational setting. | + | |
- | + | ||
- | - **Iterative Design of Semantic Grouping Guidelines and Metrics for Mobile User Interfaces** | + | |
- | + | ||
- | While prior research on widget grouping in mobile user interface (UI) design has focused on visual grouping, little work has been devoted to the semantic coherence of such groupings, which affects user understanding of the interface. We propose five design guidelines that are generally applicable for semantic element grouping in mobile UIs. We generated the guidelines through an iterative process: they were first conceived through empirical observations of existing mobile UIs and a literature review, refined through multiple rounds of feedback from UI design experts, and finally evaluated with an expert review. The feedback from experts indicate a strong need for these guidelines, as the design and evaluation of semantic grouping is currently conducted based on intuition. In addition to being a useful resource for UI design, these guidelines could lead to computational methods to evaluate interfaces. We experimented with computational metrics built from these guidelines that show promising results. | + | |
- | + | ||
- | - **A Cross-Domain Need-Finding Study with Users of Geospatial Data** | + | |
- | + | ||
- | Geospatial data—such as multispectral satellite imagery, geographically-enriched demographic data, and crowdsourced datasets like OpenStreetMap—is more available today than ever before. This data is playing an increasingly critical role in the work of Earth and climate scientists, social scientists, and data journalists exploring spatiotemporal change in our environment and societies. However, existing software and programming tools for geospatial analysis and visualization are challenging to learn and difficult to use. Many domain experts are unfamiliar with both the theory of geospatial data and the specialized Geographic Information System (GIS) software used to work with such data. While libraries for geospatial analysis and visualization are increasingly common in Python, R, and JavaScript, they still require proficiency with at least one of these programming languages in addition to geospatial data theory. In short, domain experts face steep challenges in gathering, transforming, | + | |
- | The aim of this research is to investigate the specific computing needs of the diversifying community of geospatial data users. This poster will present findings from a contextual inquiry study (n = 25) with Earth and climate scientists, social scientists, and data journalists using geospatial data in their current work. We will focus on key challenges identified in our thematic analysis, including (1) finding and transforming geospatial data to satisfy spatiotemporal constraints, | + | |
- | + | ||
- | + | ||
- | - **Striking a Balance: Reader Takeaways and Preferences when Integrating Text and Charts** | + | |
- | + | ||
- | Visualizations frequently use text to guide and inform readers. Prior work in visualization research indicates that text has an influence on reader conclusions, | + | |
- | + | ||
- | In this study, we explored several research questions about the textual components of visualizations. 302 participants viewed univariate line charts with differing amounts of text. This text varied in content and position. Participants ranked charts according to preference, with stimuli ranging from charts with no text except axes labels to a full written paragraph. They also provided their conclusions from charts with only one or two pieces of text with varying content and position. From these responses, we found that participants prefer charts with a greater amount of text in comparison to charts with fewer pieces of text or text alone. We also found that the content of the text affects reader conclusions. For example, text that describes statistical or relational components of a chart leads to more takeaways referring to statistics or relational comparisons than text describing chart elements. Additionally, | + | |
- | + | ||
- | - **Data cleaning for acronyms, abbreviations, | + | |
- | + | ||
- | In many no-code data tools, such as spreadsheets, | + | |
- | + | ||
- | + | ||
- | - **Transactional Panorama: A Conceptual Framework for User Perception in Analytical Visual Interfaces** | + | |
- | + | ||
- | Many tools empower analysts and data scientists to consume analysis results in a visual interface such as a dashboard. When the underlying data changes, these results need to be updated, but this update can take a long time—all while the user continues to explore the results. In this context, tools can either (i) hide away results that haven’t been updated, hindering exploration; | + | |
- | + | ||
- | + | ||
- | - **Towards End-User Prompt Engineering: | + | |
- | + | ||
- | A large body of prior work has examined the capabilities of pre-trained language models (“LLMs”) such as GPT-3; in contrast, relatively little work has explored how humans are able to make use of those capabilities. Using natural language to steer LLM outputs (“prompting”) is emerging as an important design technique—but prompt-based systems comply inconsistently, | + | |
- | + | ||
- | - **Discovering structure in Messy Data** | + | |
- | - **Interactive Multi-Modal Data Wrangling** | + | |
- **PACER Court Case Extraction** | - **PACER Court Case Extraction** | ||