Endangering Data Interview with Thomas Padilla

Thomas PadillaThomas Padilla is Interim Head, Knowledge Production at the University of Nevada Las Vegas. He consults, publishes, presents, and teaches widely on digital strategy, cultural heritage collections, data literacy, digital scholarship, and data curation. He is Principal Investigator of the Andrew W. Mellon Foundation supported Collections as Data: Part to Whole and past Principal Investigator of the Institute of Museum and Library Services supported, Always Already Computational: Collections as Data. He is the author of the library community research agenda Responsible Operations: Data Science, Machine Learning, and AI in Libraries.


Tell us a bit about your projects and how you became interested in cultural heritage data and algorithmic and AI approaches to curation and research?

I am interested in cultivating GLAM community capacity around responsible, ethically grounded computational engagement with data. Some of that interest has to do with positionality – me being a mixed race, first generation college student, from a working class background. I’m constantly trying to find ways for my labor to address historic and contemporary marginalization.  

Always Already Computational: Collections as Data was an Institute of Museum and Library Services supported effort that iteratively developed a range of deliverables meant to spark capacity around principles-driven creation of computationally amenable collections . In that work I was very lucky to be joined by Laurie Allen, Stewart Varner, Hannah Frost, Elizabeth Russey Roke, and Sarah Potvin. With a better sense of community need I later embarked on Collections as Data: Part to Whole – an effort supported by the Andrew W. Mellon Foundation. Part to Whole is essentially a regranting and cohort development program. Hannah Scates Kettler, Stewart Varner, Yasmeen Shorish, and I are currently working with 12 institutions (large R1s, historical societies, museums, State-based digital libraries, and more) to develop models that guide collections as data production and models that help organizations develop sustainable services around collections as data. 

Over the course of 2019 I worked as a Practitioner Research in Residence at OCLC Research, interviewing and holding convenings for professionals within and outside of libraries in the United States. This work culminated in the community research agenda Responsible Operations: Data Science, Machine Learning, and AI in Libraries. I felt a lot of pressure to get this work right. I did not want to write some breathless utopian endorsement of AI. Any success I have in that regard is due to the wisdom of the community, any failures are mine. The library community in the United States feels like it has reached a certain level of awareness regarding the pitfalls of AI, helped considerably by the work of Safiya Noble, practitioners like Jason Clark, and an understanding that library community practices have long held the potential to systematically impact communities in a discriminatory manner. 

Rumman Chowdhury introduced me to the concept of responsible operations which was a perfect way to encapsulate where it feels like we are as a community. A number of us want to use AI to strengthen library services but only if it doesn’t compromise commitments to cultivating a more equitable society. Of course, no community is uniform in their beliefs, and libraries are no exception. Some at junior and senior levels have quietly – and not so quietly – expressed the view that preoccupation with responsibility or ethics is orthogonal to progress and allows the library community to be beat in some imagined race with the private sector. These are dangerous views and the stakes are real. We must act accordingly.

 

For years, many in the library and cultural heritage world have critiqued digitization efforts as replicating (or even accelerating) long-standing biases that center on white, male, and US/Eurocentric collection patterns, viewpoints, and catalog descriptions. In both the Santa Barbara Statement on Collections as Data and the Always Already Computational: Collections as Data final report, you and your partners have pointed to a crucial need for critical engagement with biases and shortcomings and an intention to address the needs of vulnerable communities represented in the materials. What are some examples of these approaches that you’ve found to be successful?

Collections as Data: Part to Whole requires that regrantees demonstrate capacity to serve underrepresented communities – a consideration that spans thematic coverage of the collection in question, community buy-in, and a demonstrated commitment to ethical principles that work against the potential for harm. Examples of Part to Whole work addressing your questions include but are not limited to Kim Pham’s effort at the University of Denver to develop a terms of use for collections as data and Amanda Henley and Maria Estorino’s effort at the University of North Carolina Chapel Hill to discover and increase access to Jim Crow laws and other racially-based legislation in North Carolina between Reconstruction and the Civil Rights Movement. 

More broadly, there is so much good work being done. I am super inspired by Dorothy Berry’s advocacy at Harvard, resulting in a 2020-2021 exclusive focus on the digitization of Black American History. I am inspired by the Global Indigenous Data Alliance’s CARE Principles, co-led by Stephanie Russo Carroll and Maui Hudson. A response to the FAIR Principles, CARE problematizes FAIR’s, “focus on characteristics of data that will facilitate increased data sharing among entities while ignoring power differentials and historical contexts.” A CARE principle like indigenous “Authority to Control” presents a difficult and needed challenge to the cultural heritage community. What could it look like for more institutions to relinquish control of collections to their rightful owners? It is not often the case that capital – stolen or not – is returned and I imagine even the most well meaning libraries will struggle mightily within their own hierarchies to make this happen. I appreciate Eun Seo Jo and Timnit Gebru’s effort to bridge the archives community and machine learning community. Attempts to thread the needle on cross-domain work is always tough but it is definitely needed.  T-Kay Sangwand’s Preservation is Political: Enacting Contributive Justice and Decolonizing Transnational Archival Collaborations is a must read. Michelle Caswell’s work – as a whole –  is fundamental to improving efforts in these spaces. 

 

In your Responsible Operations: Data Science, Machine Learning, and AI in Libraries report, you cite Nicole Coleman’s suggestion that, in regard to machine learning, libraries might be better served to “manage bias” rather than attempt (or claim) to eliminate it. Can you talk a little bit more about that framing and why you feel it’s productive in the library world?

I think people heard enough from me about it in Responsible Operations. I encourage folks to read Nicole’s subsequently published article, Managing Bias When Library Collections Become Data

 

What do you think the role for library and information professionals is in larger conversations about “endangering data” and algorithmic and data justice?

I think there are many of us doing this work. While former Illinois University Librarian Paula Kaufman’s testimony before Congress (pg. 77) against a Federal surveillance program gives me chills every time I read it, I often end up thinking about what combination of colleagues, mentors, institutional culture, and personal and professional ethics were in place to make that act of bravery possible. That naturally leads to thinking about what it would take to cultivate similarly principled acts, large and small, among my colleagues. That seems like a promising road to head down. 

 

Is there anything else you want to add, or any work or other projects you want readers to know about?

I appreciate the opportunity to share thoughts during Endangered Data Week. In addition to the people and projects mentioned above, I encourage folks to check out the Indigenous Protocol and Artificial Intelligence Position Paper; Mozilla’s recent work on Data for Empowerment, and Ruha Benjamin’s incredibly powerful Data4BlackLives keynote.