Welcome to our Workshop on Responsible Data!
The development of large-scale datasets has been essential to the progress of machine learning and artificial intelligence. However, many of these datasets are not inclusive or diverse - particularly computer vision datasets, which can lead to biased models and algorithms. This workshop will bring together practitioners and researchers to discuss the challenges and opportunities of building more responsible datasets.
The workshop will cover a range of topics, including:
- Moving beyond pragmatism and implementation of context and consent-driven procedures in dataset development
- What are the main themes when it comes to responsible datasets? Are there specific benchmarks currently utilized?
- Challenges, risks and benefits of collecting gender, race, skin tone, physical attributes, accessibility data, and other person attributes.
- What are the best practices when training individuals for data collection and annotators? To what extent does diversity matter when it comes to data collection and annotators? How the organizational structures of these businesses and the ecosystem of stakeholders contribute to the responsible dimension of the datasets?
- What are the new considerations in a world of pretrained models and synthetic data?
- How should we build responsible datasets for generative AI models and applications?
- How do we quantitatively measure how responsible a dataset is?
- What does Transparency translate to in the context of dataset development?
- How do notions of Data Privacy like those articulated in proposals such as the Blueprint for an Bill of Rights translate to building towards responsible datasets?
- How do we build a framework for Dataset Accountability?
- How should we best engage the open source community when building, updating, and maintaining datasets?
- State of Affairs: a summary of progress to date - how responsible datasets have evolved. What best practices can be leveraged more broadly?
Post workshop, we plan to write a white paper summarizing the round table discussions and opinions from experts in the field (with necessary permissions). We will also follow through with making a community space on discord (or similar platform) to continue the community building and collaboration post-workshop.
Important Dates
Submission Deadline | April 12, 2024 | |
Final Decisions | April 30, 2024 | |
Workshop Date | June 18, 2024 |
Schedule
The following schedule is tentative and will be confirmed closer to the workshop:
Time | Topic | Speaker(s)/Presenter(s) |
---|---|---|
8:30-8:45 | Opening Remarks | Dr. Candice Schumann |
8:45-9:15 | Keynote | Dr. Sara Beery |
9:15-9:40 | Rapid Fire Talks 1 | TBD |
9:45-10:15 | Poster Session 1 | TBD |
10:15-10:45 | Coffee Break | |
10:45-11:45 | Round Table Discussion 1 | |
11:45-13:00 | Lunch Break | |
13:00-13:30 | Keynote | Dr. William Agnew |
13:30-14:15 | Round Table Discussion 2 | |
14:15-14:40 | Rapid Fire Talks 2 | TBD |
14:45-15:15 | Poster Session 2 | TBD |
15:15-15:45 | Coffee Break | |
15:45-16:45 | Panel Discussion | Moderator: TBD Panelists: Nati Catalan, Dr. Sven Cattell, Dr. Morgan Klaus Scheuerman, Emily McReynolds |
16:45-17:15 | Closing Remarks | Dr. Caner Hazirbas |
Keynote Speakers
Dr. William Agnew coming soon.
Panelists
Sven Cattell coming soon.
Morgan Klaus Scheuerman coming soon.
Emily McReynolds (She/Her) has worked in data protection, machine learning & AI, across academia, civil society, and in the tech industry. In previous roles, she led partnerships with civil society & industry engagement on responsible AI at Meta, and created end-to-end data strategy for ML development at Microsoft. With a passion for translating complex technical concepts into understandable sound bites, she has spearheaded a number of tech explanation projects including AI System Cards, a resource for understanding how AI works in different contexts. She was the founding program director for the University of Washington’s Tech Policy Lab, an interdisciplinary collaboration across the Computer Science, Information, and Law schools. She started coding in the time of HTML and taught people to use computers back when we used floppy disks.
Organizers
Contact
Contact the organizers at responsibledata@googlegroups.com
Call for Papers
Authors are invited to submit relevent research (including work in progress, novel perspectives, etc.) as extended abstracts for the poster session and workshop discussion. Please see relevent topics above. Accepted abstracts will be presented at the poster session, and will not be included in the printed proceedings of the workshop.
The extended abstract can be at most 4 pages long in CVPR format, not including references. Authors may supply supplementary material, however, reviewers will not be required to read this material. Reviews will be double blind. The submission deadline is March 31, 2024.
Submit your extended abstracts through OpenReview.