This post was written by Steve Lapommeray, who received a DLF Students and New Professionals Fellowship to attend the 2018 Forum.
Steve Lapommeray is a Programmer Analyst in the Digital Initiatives team at the McGill University Library, working on deployment automation, websites, and supporting their ILS.
At his first DLF Forum, he looks forward to learning from and collaborating with other professionals in the digital library field. He is excited to have the opportunity to learn more about digital library projects, gain skills that he can apply at his library, and break out of the silos that separate librarianship from application development.
As a recipient of a DLF Students & New Professionals Fellowship, I got the opportunity to go to the 2018 DLF Forum for the first time. I went with the idea of trying to balance attending sessions that dealt with my chosen field (programming) with sessions that dealt with other aspects of digital libraries that I was less familiar with (everything else). With so much to see and learn, the conference could get overwhelming at times. The quiet room and meditation sessions were much appreciated!
One session (out of many!) that piqued my interest was #m4e: Topic Modeling and Machine Learning. The introduction to the concept by Bret Davidson and Kevin Beswick of NCSU Libraries showed how they were able to use machine learning to have a self-driving kart in Mario Kart 64. For libraries, functionality such as an automatic first pass of metadata, improvements in video/image processing and OCR are avenues that should be explored.
They also mentioned that the initial data is often the source of algorithmic bias in deep learning. The initial data sets that feed the machine learning algorithm can very easily come from a narrow range of sources, and there is a need to create more representative data sets. Ways to mitigate this bias are to expose to the user that this technology is being used, to give the user the option to provide feedback, and the option to turn the technology off altogether. User awareness of how the results are being generated can demystify some of the machine learning process, as well as allow the user to make more informed decisions rather than accept the algorithm as the absolute source of truth.
Another way to correct issues in the algorithm is to use “transfer learning”. It’s a way to retrain parts of the algorithm that are not giving optimal results. Parts of the machine learning layer are taken out of the whole and retrained on smaller data sets. This is to improve the decision-making of the individual parts without having to involve the entire system. Once the retraining is completed, the removed parts are put back into the whole.
One advantage for users in the library and cultural heritage institution field is that the service providers are not in the business of making money, so they can focus on providing the best user experience.
The “Future/Death of the Library: A Collaborative, Experimental, Introspective Dive into Digital Humanities” talk by Rebekah Cummings, Anna Neatrour, and Elizabeth Callaway of the University of Utah also had very interesting observations. Mentions of the death of and the future of the library in texts were found through topic modeling using R. They then found what words were used in relation to each other and generated word clouds of the most common terms. They then analyzed which terms surfaced most often. This approach does have limitations. A term such as “electronic book” counts as two separate words rather than as one concept. As such, that term would not correctly represented in a world cloud. Sadly, this approach was not able to predict the ultimate fate of libraries.
Erin Wolfe of the University of Kansas spoke about the Black Book Interactive Project, continuing on the theme of topic modeling and data mining with regards to African American literary texts. This project addresses creating metadata for African American literature.
Lastly, Darnelle Melvin of UNLV gave the “Using Machine Learning & Text-mining for Scholarly Output” talk. His work is currently in a data collection phase and makes use of machine learning and text mining.
Apart from that session, I attended others dealing with labour inequities with regards to library staff, 3D and virtual reality collections, linked data, and institutional repository migrations. It was a lot of information to take in and I’m glad that the shared notes and slides are available online. Thank you to DLF, my fellow fellows, and all of the speakers, panelists, presenters, and attendees. This was an amazing opportunity to explore areas of the library world that I normally would not be exposed to and a chance to meet some great people.
Want to know more about the DLF Forum Fellowship Program? Check out last year’s call for applications.
If you’d like to get involved with the scholarship committee for the 2019 Forum (October 13-16, 2019 in Tampa, FL), look for the Planning Committee sign-up form later this year. More information about 2019 fellowships will be posted in late spring.