CVPR 2024 Workshop on Responsible Data

Welcome to our Workshop on Responsible Data!

The development of large-scale datasets has been essential to the progress of machine learning and artificial intelligence. However, many of these datasets are not inclusive or diverse - particularly computer vision datasets, which can lead to biased models and algorithms. This workshop will bring together practitioners and researchers to discuss the challenges and opportunities of building more responsible datasets.

The workshop will cover a range of topics, including:

Moving beyond pragmatism and implementation of context and consent-driven procedures in dataset development
What are the main themes when it comes to responsible datasets? Are there specific benchmarks currently utilized?
Challenges, risks and benefits of collecting gender, race, skin tone, physical attributes, accessibility data, and other person attributes.
What are the best practices when training individuals for data collection and annotators? To what extent does diversity matter when it comes to data collection and annotators? How the organizational structures of these businesses and the ecosystem of stakeholders contribute to the responsible dimension of the datasets?
What are the new considerations in a world of pretrained models and synthetic data?
How should we build responsible datasets for generative AI models and applications?
How do we quantitatively measure how responsible a dataset is?
What does Transparency translate to in the context of dataset development?
How do notions of Data Privacy like those articulated in proposals such as the Blueprint for an Bill of Rights translate to building towards responsible datasets?
How do we build a framework for Dataset Accountability?
How should we best engage the open source community when building, updating, and maintaining datasets?
State of Affairs: a summary of progress to date - how responsible datasets have evolved. What best practices can be leveraged more broadly?

Post workshop, we plan to write a white paper summarizing the round table discussions and opinions from experts in the field (with necessary permissions). We will also follow through with making a community space on discord (or similar platform) to continue the community building and collaboration post-workshop.

Important Dates

Submission Deadline	~~March 31, 2024~~	April 12, 2024
Final Decisions	~~April 22, 2024~~	April 30, 2024
Workshop Date	June 18, 2024

Schedule

The following schedule is tentative and will be confirmed closer to the workshop:

Time	Topic	Speaker(s)/Presenter(s)
8:30-8:45	Opening Remarks	Dr. Candice Schumann
8:45-9:15	Keynote	Benchmarking models in a changing world: geospatial and temporal distribution shifts, diverse end users with conflicting priorities, and heterogeneously sampled data across modalities Dr. Sara Beery
9:15-9:40	Rapid Fire Talks 1	See Extended Abstracts Session 1
9:45-10:15	Poster Session 1	See Extended Abstracts Session 1
10:15-10:45	Coffee Break
10:45-11:45	Round Table Discussion 1
11:45-13:00	Lunch Break
13:00-13:30	Keynote	Mapping the Computer Vision Surveillance and Weapons Pipeline Dr. William Agnew
13:30-14:15	Round Table Discussion 2
14:15-14:40	Rapid Fire Talks 2	See Extended Abstracts Session 2
14:45-15:15	Poster Session 2	See Extended Abstracts Session 2
15:15-15:45	Coffee Break
15:45-16:45	Panel Discussion	Moderator: Susanna Ricco Panelists: Noa Franko-Ohana, Dr. Sven Cattell, Dr. Morgan Klaus Scheuerman, Emily McReynolds
16:45-17:15	Closing Remarks	Dr. Caner Hazirbas

Extended Abstracts

Session 1

Is ImageNet Pre-training Fair in Image Recognition? Ryosuke Yamada, Ryo Takahashi, Go Ohtani, Erika Mori, Hirokatsu Kataoka, Yoshimitsu Aoki
The role of image anonymization in balancing fairness and privacy in data collection: analysis of ethical and technical challenges Luca Piano, Pietro Basci, Fabrizio Lamberti, Lia Morra
Ensuring AI Data Access Control in RDBMD: A Comprehensive Review William kandolo

Session 2

Data Sharing Policies and Considerations Must Influence Machine Learning Research Directions in Ecological Applications Neha Hulkund, Millie Chapman, Ruth Oliver, Sara Beery
DETER: Detecting Edited Regions for Deterring Generative Manipulations Sai Wang*, Ye Zhu*, Ruoyu Wang, Amaya Dharmasiri, Olga Russakovsky, Yu Wu
AI-EDI-SPACE: A Co-designed Dataset for Evaluating the Quality of Public Spaces Hugo Berard, Shreeyash Gowaikar
Machines are Learning, African Communities are Training Wilhelmina Ndapewa Onyothi Nekoto, Sanjana Paul, Olanrewaju Samuel, Kasia Chmielinski, Camille Minns

Keynote Speakers

Dr. Sara Beery is the Homer A. Burnell Career Development Professor in the MIT Faculty of Artificial Intelligence and Decision-Making. She was previously a visiting researcher at Google, working on large-scale urban forest monitoring as part of the Auto Arborist project. She received her PhD in Computing and Mathematical Sciences at Caltech in 2022, where she was advised by Pietro Perona and awarded the Amori Doctoral Prize for her thesis. Her research focuses on building computer vision methods that enable global-scale environmental and biodiversity monitoring across data modalities, tackling real-world challenges including geospatial and temporal domain shift, learning from imperfect data, fine-grained categories, and long-tailed distributions. She partners with industry, nongovernmental organizations, and government agencies to deploy her methods in the wild worldwide. She works toward increasing the diversity and accessibility of academic research in artificial intelligence through interdisciplinary capacity building and education, and has founded the AI for Conservation slack community, serves as the Biodiversity Community Lead for Climate Change AI, and founded and directs the Summer Workshop on Computer Vision Methods for Ecology.

Dr. William Agnew is a CBI postdoc fellow at CMU. William received his Ph.D. from University of Washington with Sidd Srinivasa, where he worked on AI ethics, critical AI, and robotics. William also helped found Queer in AI. William is interested in developing and sharing tools and ideas that go beyond participatory design and allow marginalized individuals and communities to own and meaningfully control their data and models derived from that data. Building on ideas from usable security/privacy, usage licenses, and indigenous data sovereignty, William wants to contribute to data and AI futures where individuals and communities know where their data is and can remove, add, or change their data in different datasets.

Panelists

Noa Franko-Ohana is a seasoned technology professional with a background in R&D, Product Management and innovation boasting 2 decades of leadership in startups and enterprises such as IBM, Microsoft, Seagate and more. In her current role as VP Partnerships at Tasq.ai, Noa has passionately dedicated efforts to bridging the gap between artificial intelligence and human intuition. At Tasq.ai, Noa leads the developments of generative AI evaluation solutions with focusing on ethical solutions, utilizing global, diverse human guidance for evaluating and training ML models. Noa's experience at Seagate, IBM and Microsoft further solidified her commitment to responsible AI, where she led programs that supported startup engagement, ethical technology advocacy, and the development of AI solutions that benefit society.

Sven Cattell founded the AI Village in 2018 and has been running it ever since. He was the principal organizer of AIV’s Generative Red Team at DEFCON 31. Sven is also the founder of nbhd.ai, a startup focused on the security and integrity of datasets and the AI they build. He was previously a senior data scientist at Elastic where he built the malware model training pipeline. He has a PhD in Algebraic Topology, and a postdoc in geometric machine learning where he focused on anomaly and novelty detection.

Morgan Klaus Scheuerman is a research scientist on Sony AI's AI Ethics team and a visiting scholar in Information Science at University of Colorado Boulder. He received his PhD from University of Colorado Boulder, where he was a Microsoft PhD Research Fellow. Morgan broadly focuses on mitigating technical harms, particularly in the context of AI development and deployment. Much of his work has examined how computer vision systems embed specific values that disempower historically marginalized groups. He publishes at top-tier research venues like CSCW, FAccT, CHI, and Big Data & Society. His work has received multiple best paper awards, honorable mentions, and diversity and inclusion awards.

Emily McReynolds (She/Her) has worked in data protection, machine learning & AI, across academia, civil society, and in the tech industry. In previous roles, she led partnerships with civil society & industry engagement on responsible AI at Meta, and created end-to-end data strategy for ML development at Microsoft. With a passion for translating complex technical concepts into understandable sound bites, she has spearheaded a number of tech explanation projects including AI System Cards, a resource for understanding how AI works in different contexts. She was the founding program director for the University of Washington’s Tech Policy Lab, an interdisciplinary collaboration across the Computer Science, Information, and Law schools. She started coding in the time of HTML and taught people to use computers back when we used floppy disks.

Organizers

Contact

Contact the organizers at responsibledata@googlegroups.com

Call for Papers

Authors are invited to submit relevent research (including work in progress, novel perspectives, etc.) as extended abstracts for the poster session and workshop discussion. Please see relevent topics above. Accepted abstracts will be presented at the poster session, and will not be included in the printed proceedings of the workshop.

The extended abstract can be at most 4 pages long in CVPR format, not including references. Authors may supply supplementary material, however, reviewers will not be required to read this material. Reviews will be double blind. The submission deadline is March 31, 2024.

Submit your extended abstracts through OpenReview.