The recent rise of large, general-purpose, and commoditized AI models presents a rich opportunity for a new wave of innovation in data-centric applications. Suddenly, we have within our grasp the ability to make sense of unstructured data formats, such as images, videos, PDFs, and text, alongside structured data, as well as to author complex data transformation code via AI-based program synthesis.
However, in both cases—unstructured data understanding and data transform synthesis—this potential is far from realized, especially across application domains. AI models only solve a small fraction of the data-centric task, with the remaining consuming the vast majority of human time. Humans “in-the-loop” end up having to write custom code to try to deal with the inevitable idiosyncrasies and inaccuracies in the AI models when applied to the data-centric task, including lack of control, trust, and transparency. And in most cases these efforts end up unsuccessful.
Overall, we are not delivering on the promise of next-generation predictive software technologies based in machine learning and program synthesis. Despite radical improvement in the quality of models, the persistent challenges lie in developing technologies and experiences that successfully integrate human intelligence with predictive programming methods. In fact, we believe integrating human intelligence with predictive programming is one of the key challenges in modern computing—but computing research so far has neglected it due to our traditional focus on developing software for software engineers. Data work, including analytics and data science—with its rich tradition of declarative DSLs and visualization—is a particularly promising domain to explore these challenges.
The mission of the EPIC Data lab is to democratize data work via no-code and low-code interfaces, leveraging scalable AI-powered program synthesis techniques, targeting collaboration across a range of personas and teams.
- Data work: we are developing human-centered techniques that generalize across all steps in the data science and ML/AI lifecycle. This ranges from extraction and cleaning, to exploration and insight discovery, and also to ML model building and interpretation. Our focus is especially on techniques that bring non-tabular data such as text, images, video and speech into the purview of traditional exploratory data analysis tools, and also push the boundaries of current approaches for traditional tabular data.
- Interfaces: we are exploring how to best use various interface modalities, including natural language, examples, demonstration, sketches, as well as direct manipulation for data work.
- Techniques: we are leveraging state-of-the-art AI-assisted coding and program synthesis techniques that efficiently map fuzzy user interactions to desired data transformations.
- Collaboration: our tools and techniques target teams comprising heterogeneous programming abilities, allowing high-code, low-code, and no-code users to work with each other and build on each other’s work.
The EPIC Data lab continues in the long tradition of 5-year multidisciplinary labs at Berkeley. Past labs include AMP and RISE, both of which led to multiple commercial ventures, open-source software artifacts, and impactful research papers. These labs are built on strong collaborations with industry partners. Industry partners work closely with faculty and students to identify research problems of mutual interest. In addition to industry applications, the EPIC Data lab has a special emphasis on applications in social justice.
In many social justice movements, we are seeing a massive increase in the availability of valuable data — data that teams could use to work towards a fair and equitable society for all. At the same time, we observe a severe dearth of tools that enable domain experts, advocates, and volunteers to make sense of this data to effect change. In developing the low-code and no-code tools for data work, we are partnering closely with organizations working towards social justice, both to directly support their work and to ensure that our tools can support many more social justice oriented organizations going forward.
One of our current collaborations centers on criminal justice and police misconduct data. This space exhibits an extreme version of the ‘massive data, minimal tools’ pattern described above. Massive new sources of police misconduct data have become available in the last two years, thanks to the work of advocates and reformers. Our public defense collaborators at the National Association of Criminal Defense Lawyers (NACDL) and journalist collaborators at KQED/Big Local News have been collecting such information from sources like news articles, lawsuits, FOIA requests, as well as legally mandated public disclosures from police departments.
Despite the dramatic increase in raw data access, the defense attorneys and journalists who need this information are struggling to use it. Because the data is messy, unstructured, and comes in hundreds of different formats, extracting insights requires the same level of programming skill that software engineers gain from a four-year computer science education. For example, for public defenders who are already stretched thin with high caseloads and low resources, it is impossible for them to make sense of this data to help build their case — often requiring the computing expertise of a junior software engineer.
Similar challenges—access to raw data but a dearth of appropriate data tooling—arise in many other examples of under-resourced teams working towards social justice. Other applications we’re exploring include countering hate speech, public health, and addressing housing inequality.
We are developing no-code and low-code data work tools in close partnership with social justice-oriented teams to ensure we serve these and other similarly under-resourced social justice efforts. Our intent is that these tools will give domain experts, advocates, and volunteers the ability to make sense of data without formal computing training.
§Statement of Values
Our goal with the EPIC data lab is to create a safe, accepting, and inclusive environment where everyone, independent of background, can not just thrive, but flourish. We believe in strength in diversity, and that everyone is on a journey of continual “un”-learning and self-reflection to identify and address implicit biases. We care deeply about social justice, and encourage lab members to work towards social justice causes, through research or other means.