Using Cloud Storage for Access to Digital Archives

This post was written by the members of DLF’s Born Digital Access Working Group’s Cloud Storage for Accessing Born-Digital Materials subgroup: Adriane Hanson (University of Georgia), Caterina Reed (Stony Brook University), Lara Friedman-Shedlov (University of Minnesota), Matthew McEnriy (Texas Tech University), Nicole Scalessa (Vassar College), and Steven Gentry (University of Michigan).

Introduction and Background

Cloud computing (a.k.a. “the cloud”) is commonly used by gallery, library, archive, and museum (GLAM) professionals to provide access to born-digital material. Due to the cloud’s overall importance and prevalence, a subgroup of the Digital Library Federation’s Born Digital Access Working Group (DLF BDAWG) formed in March 2021 to more closely explore the intersection of cloud computing and the provision of access to born digital material. This subgroup, originally entitled “Using Google Drive for Access,” drew inspiration from Eric C. Weig’s “Leveraging Google Drive for Digital Library Object Storage” for both its title and original focus on how GLAM professionals could use Google Drive to provide access to this material. The initial focus was on Google Drive due to its proliferation among institutions as a mechanism to provide access. Not only is Google Drive widely and cheaply available, it can host nearly any type of file and permits a great deal of granular control over who can access the material and for what period of time. Until recently, storage quotas were also extremely generous, if not unlimited, which permitted repositories to store collections consisting of many gigabytes of data.

However, at the first group meeting in May 2021, members discussed not only Weig’s article, but also Google’s forthcoming 2022(1) changes to their Workspace for Education (formerly known as the G Suite for Education; see Peters 2021 article). In particular, the change in Google’s data storage policy—from “unlimited storage to qualifying institutions…[to] a baseline of 100TB of pooled [free] storage shared among an institution” prompted many thoughts (see Peters 2021 article). Google claims that is enough storage for “approximately over 100 million documents, 8 million presentations, or 400,000 hours of video”; the reality, however, is that many organizations already exceed that quota. For example, the Bentley Historical Library, a single unit within the University of Michigan, was already stewarding 136 terabytes of digital material by July 2020 (see Bentley’s Diversity, Equity and Inclusion Strategic Plan Five-Year Strategic Objectives, Measures, and FY21 Actions). While Google will permit organizations to purchase additional storage, many GLAM professionals that have been using Google Drive to store collection-related files are understandably concerned that they will not have the leverage to negotiate the recurring budgetary costs necessary for sustainability. 

Ultimately, this change resulted in group members re-naming the subgroup “Cloud Storage Solutions for Accessing Born-Digital Materials” and pivoting to center their efforts on general low-cost and no-cost cloud storage solutions for access. 

Over the next few meetings, the subgroup searched the literature for case studies about how institutions are using cloud storage to provide access to born digital material. Unfortunately, after conducting this deeper literature review and discussing the results, it was determined that there was a gap in the literature on this topic. Members also compiled a list of free cloud-based storage solutions similar to Google Drive. However, it was discovered that the amount of storage users are granted for free from these solutions is incredibly small when compared to both the forthcoming Google Workspace for Education and the original G Suite for Education. Consider the following examples: 

If a librarian/archivist were to use all the free options listed above, they would max out their cloud storage at 110 GB. While some GLAMs could theoretically provide access to digitally born materials using these methods concurrently, the amount of work to juggle different storage capacities and limitations would be significant. Additionally, born digital archival collections frequently consist of extremely heterogeneous file types and formats that cannot be stored in other repositories maintained by their institutions. Such repositories also face the challenge of providing access to born digital archival materials at varying levels of restriction.  

Survey Results

The apparent literature review gap—as well as challenges in locating freely available cloud storage solutions—encouraged the subgroup to conduct a survey to gather practical examples of how repositories are providing cloud storage access to born digital materials. After drafting and periodically revising the survey based on internal and external feedback, the survey(2) was sent to the DLF BDAWG email list and was open from April 5 to April 15, 2022. Eight responses were received, although—because one respondent does not use the cloud for access due to privacy concerns—only seven responses met the criteria for a complete analysis(3). Of these responses, six are higher education institutions and one is a self-described “nonprofit think tank.” They provide access to collections ranging in size from 0.5 TB to 200 TB. Most of the respondents use the cloud for access on an experimental or case-by-case basis, often relying on more than one platform, depending on the circumstances. More than one respondent uses Google Drive (4), Amazon Web Services (3), OneDrive (2), and Omeka (2). Other platforms used included Box, LibSafe Go, virtual servers, DSpace, contentDM, bepress, and Figshare. These were platforms that the respondent’s institution already subscribed to rather than something acquired specifically for born digital materials. Use of multiple platforms allowed respondents to match the access method to the technical skills of the requestor. 

Respondents experienced a number of challenges across all of these platforms, including difficulty providing access to larger files, integrating with preservation or access systems, and managing conditions governing access.  It is clear that the cloud options implemented were often not by choice but rather by circumstance. Respondents inferred a need to find sustainable cloud solutions that align with staffing expertise and capacity within budgetary constraints to ensure sustainability. 

What Now?

The largest issue of concern that was revealed through the subgroups work is how Google’s decision to limit its previously unlimited storage for educational institutions will ultimately impact GLAMs and their efforts to connect researchers with born digital material. At other times, these talks became much broader and questioned how large organizations could foster specific answers to these increasingly complex topics. 

The practical steps the Born Digital Access Working Group’s Cloud Storage for Accessing Born-Digital Materials subgroup recommend include: 

  • Build/maintain positive relationships and open communication with information technology staff as it relates to storage, access, and migration 
  • Draft a communication plan, digital management plan, or other relevant documentation, to share concerns about Google Drive with stakeholders. 

Ultimately, there is much more work that needs to be done on this topic than could be accomplished by a handful of volunteers within a year. Given the wide range of free and paid cloud storage solutions currently available to GLAMs, as well as the lack of consensus from the literature and survey respondents, we are confident that cloud storage remains a complex issue that will not disappear any time soon. As such, readers are encouraged to join BDAWG, continue this work, and look even more closely into the intersection of born digital materials and cloud storage options. 

(1) Some institutions have received extensions via tech consortia that they are members of.
(2) Thanks to Corey Schmidt, Project Management Librarian at the University of Georgia, for sharing his expertise about survey development.
(3) Given the small number of responses, these results should be taken anecdotally and not as trends.


Did you enjoy this post? Please Share!


Related Posts

DLF Digest: May 2024

A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation. See all past Digests here. 

Skip to content