Welcome to our Workshop on Responsible Data!
The development of large-scale datasets has been essential to the progress of machine learning and artificial intelligence. However, many of these datasets are not inclusive or diverse - particularly computer vision datasets, which can lead to biased models and algorithms. This workshop will bring together practitioners and researchers to discuss the challenges and opportunities of building more responsible datasets.
The workshop will cover a range of topics, including:
- Moving beyond pragmatism and implementation of context and consent-driven procedures in dataset development
- What are the main themes when it comes to responsible datasets? Are there specific benchmarks currently utilized?
- Challenges, risks and benefits of collecting gender, race, skin tone, physical attributes, accessibility data, and other person attributes.
- What are the best practices when training individuals for data collection and annotators? To what extent does diversity matter when it comes to data collection and annotators? How the organizational structures of these businesses and the ecosystem of stakeholders contribute to the responsible dimension of the datasets?
- What are the new considerations in a world of pretrained models and synthetic data?
- How should we build responsible datasets for generative AI models and applications?
- How do we quantitatively measure how responsible a dataset is?
- What does Transparency translate to in the context of dataset development?
- How do notions of Data Privacy like those articulated in proposals such as the Blueprint for an Bill of Rights translate to building towards responsible datasets?
- How do we build a framework for Dataset Accountability?
- How should we best engage the open source community when building, updating, and maintaining datasets?
- State of Affairs: a summary of progress to date - how responsible datasets have evolved. What best practices can be leveraged more broadly?
Post workshop, we plan to write a white paper summarizing the round table discussions and opinions from experts in the field (with necessary permissions). We will also follow through with making a community space on discord (or similar platform) to continue the community building and collaboration post-workshop.
Important Dates
Submission Deadline | April 12, 2024 | |
Final Decisions | April 30, 2024 | |
Workshop Date | June 18, 2024 |
Schedule
The following schedule is tentative and will be confirmed closer to the workshop:
Time | Topic | Speaker(s)/Presenter(s) |
---|---|---|
8:30-8:45 | Opening Remarks | Dr. Candice Schumann |
8:45-9:15 | Keynote | Benchmarking models in a changing world: geospatial and temporal distribution shifts, diverse end users with conflicting priorities, and heterogeneously sampled data across modalities Dr. Sara Beery |
9:15-9:40 | Rapid Fire Talks 1 | See Extended Abstracts Session 1 |
9:45-10:15 | Poster Session 1 | See Extended Abstracts Session 1 |
10:15-10:45 | Coffee Break | |
10:45-11:45 | Round Table Discussion 1 | |
11:45-13:00 | Lunch Break | |
13:00-13:30 | Keynote | Mapping the Computer Vision Surveillance and Weapons Pipeline Dr. William Agnew |
13:30-14:15 | Round Table Discussion 2 | |
14:15-14:40 | Rapid Fire Talks 2 | See Extended Abstracts Session 2 |
14:45-15:15 | Poster Session 2 | See Extended Abstracts Session 2 |
15:15-15:45 | Coffee Break | |
15:45-16:45 | Panel Discussion | Moderator: Susanna Ricco Panelists: Noa Franko-Ohana, Dr. Sven Cattell, Dr. Morgan Klaus Scheuerman, Emily McReynolds |
16:45-17:15 | Closing Remarks | Dr. Caner Hazirbas |
Extended Abstracts
Session 1
- Is ImageNet Pre-training Fair in Image Recognition? Ryosuke Yamada, Ryo Takahashi, Go Ohtani, Erika Mori, Hirokatsu Kataoka, Yoshimitsu Aoki
- The role of image anonymization in balancing fairness and privacy in data collection: analysis of ethical and technical challenges Luca Piano, Pietro Basci, Fabrizio Lamberti, Lia Morra
- Ensuring AI Data Access Control in RDBMD: A Comprehensive Review William kandolo
Session 2
- Data Sharing Policies and Considerations Must Influence Machine Learning Research Directions in Ecological Applications Neha Hulkund, Millie Chapman, Ruth Oliver, Sara Beery
- DETER: Detecting Edited Regions for Deterring Generative Manipulations Sai Wang*, Ye Zhu*, Ruoyu Wang, Amaya Dharmasiri, Olga Russakovsky, Yu Wu
- AI-EDI-SPACE: A Co-designed Dataset for Evaluating the Quality of Public Spaces Hugo Berard, Shreeyash Gowaikar
- Machines are Learning, African Communities are Training Wilhelmina Ndapewa Onyothi Nekoto, Sanjana Paul, Olanrewaju Samuel, Kasia Chmielinski, Camille Minns
Keynote Speakers
Dr. William Agnew is a CBI postdoc fellow at CMU. William received his Ph.D. from University of Washington with Sidd Srinivasa, where he worked on AI ethics, critical AI, and robotics. William also helped found Queer in AI. William is interested in developing and sharing tools and ideas that go beyond participatory design and allow marginalized individuals and communities to own and meaningfully control their data and models derived from that data. Building on ideas from usable security/privacy, usage licenses, and indigenous data sovereignty, William wants to contribute to data and AI futures where individuals and communities know where their data is and can remove, add, or change their data in different datasets.
Panelists
Sven Cattell founded the AI Village in 2018 and has been running it ever since. He was the principal organizer of AIV’s Generative Red Team at DEFCON 31. Sven is also the founder of nbhd.ai, a startup focused on the security and integrity of datasets and the AI they build. He was previously a senior data scientist at Elastic where he built the malware model training pipeline. He has a PhD in Algebraic Topology, and a postdoc in geometric machine learning where he focused on anomaly and novelty detection.
Morgan Klaus Scheuerman is a research scientist on Sony AI's AI Ethics team and a visiting scholar in Information Science at University of Colorado Boulder. He received his PhD from University of Colorado Boulder, where he was a Microsoft PhD Research Fellow. Morgan broadly focuses on mitigating technical harms, particularly in the context of AI development and deployment. Much of his work has examined how computer vision systems embed specific values that disempower historically marginalized groups. He publishes at top-tier research venues like CSCW, FAccT, CHI, and Big Data & Society. His work has received multiple best paper awards, honorable mentions, and diversity and inclusion awards.
Emily McReynolds (She/Her) has worked in data protection, machine learning & AI, across academia, civil society, and in the tech industry. In previous roles, she led partnerships with civil society & industry engagement on responsible AI at Meta, and created end-to-end data strategy for ML development at Microsoft. With a passion for translating complex technical concepts into understandable sound bites, she has spearheaded a number of tech explanation projects including AI System Cards, a resource for understanding how AI works in different contexts. She was the founding program director for the University of Washington’s Tech Policy Lab, an interdisciplinary collaboration across the Computer Science, Information, and Law schools. She started coding in the time of HTML and taught people to use computers back when we used floppy disks.
Organizers
Contact
Contact the organizers at responsibledata@googlegroups.com
Call for Papers
Authors are invited to submit relevent research (including work in progress, novel perspectives, etc.) as extended abstracts for the poster session and workshop discussion. Please see relevent topics above. Accepted abstracts will be presented at the poster session, and will not be included in the printed proceedings of the workshop.
The extended abstract can be at most 4 pages long in CVPR format, not including references. Authors may supply supplementary material, however, reviewers will not be required to read this material. Reviews will be double blind. The submission deadline is March 31, 2024.
Submit your extended abstracts through OpenReview.