This piece inaugurates an occasional series by or about linked data practitioners that will be cross-posted on the DLF site and LOD-LAM.net. The first post in the series is a personal reflection on the linked data landscape written by Jerry Persons, technology analyst at Knowledge Motifs, Chief Information Architect emeritus at Stanford, and author of the CLIR-commissioned Literature survey in support of Stanford Linked Data Workshop.
The ecosystem in which both library-generated metadata and vendor-generated search environments are players has changed radically with unprecedented swiftness:
- search engines continue to morph, witness Bing, WolframAlpha, Siri
- Google surfaces its things not strings work as Knowledge Graph
- schema.org announces a W3C vehicle to extend its core vocabulary
- Microsoft’s Academic Search provides glimpses of new ways to find connections
- Nature Publishing Group initiates linked-data access to some of its metadata
- the BNB and Harvard’s cataloging come out of the closet as CC0 data
- many national libraries release CC0 bibliographic and authority data
- Europeana continues to expand open access to cultural heritage metadata
- OCLC moves toward ODC-BY for VIAF and other of its data environments
- W3C Library linked data incubator group issues its announces [linked data] modeling initiative
Richard Wallis (late of Talis, now OCLC) recently summarized these trends in terms of web-wide factors in his post A data 7th wave approaching:
With the advent of many data associated advances, variously labelled Big Data, Social Networking, Open Data, Cloud Services, Linked Data, Microformats, Microdata, Semantic Web, Enterprise Data, it is now venturing beyond those closed systems into the wider world.
Well this is nothing new, you might say, these trends have been around for a while – why does this constitute the seventh wave of which you foretell?
It is precisely because these trends have been around for a while, and are starting to mature and influence each other, that they are building to form something really significant ….
Indeed, for those in pursuit of a broader-than-library take on what’s going on in the web-wide world of structured data, one should take advantage of Richard’s experience including a deep understanding of libraries as a member the Talis library systems group and spanning the company’s evolution toward its present-day provision of Kasabi, “a startup business spun out from and backed by Talis. Our aim is to unlock the value in the World’s data by enabling new business models for producers and consumers of structured data at all scales.” Among his posts and presentations worth close review are those that can be had at his Data Liberate site, for example:
- Create data not records
- Libraries through the linked data telescope
- Who will be mostly right – Wikidata, Schema.org
My own views on the potential benefits to be had from a rapidly evolving web that is increasingly dominated by well-structured and well-curated data were shaped in large part by exposure to the vision, concepts, and people involved in a set of antecedents to the current flurry of activity and developments. The thread leads from a turn of the century piece written by Danny Hillis, through his Applied Minds and Metaweb companies, leading to Freebase and John Giannandrea, and onward from there to the recent Wall Street Journal interview with Amit Singhal and the subsequent discussions surrounding Knowledge Graph and things not strings:
Hillis: With the knowledge web, humanity’s accumulated store of information will become more accessible, more manageable, and more useful. Anyone who wants to learn will be able to find the best and the most meaningful explanations of what they want to know. Anyone with something to teach will have a way to reach those who want to learn. Teachers will move beyond their present role as dispensers of information and become guides, mentors, facilitators, and authors. The knowledge web will make us all smarter. The knowledge web is an idea whose time has come. Hillis, W. Daniel. “Aristotle”: (The knowledge web), 2000, published in The Edge (138) in 2004.
Freebase: A new company founded by a longtime technologist is setting out to create a vast public database intended to be read by computers rather than people, paving the way for a more automated Internet in which machines will routinely share information. Markoff, John. Start-up aims for database to automate web searching. NYT (9 March 2007).
Giannandrea: Freebase is an open database of the world’s information, built by a global community and free for anyone to query, contribute to, and build applications on. … Part of what makes this open database unique is that it spans domains, but requires that a particular topic exist only once in Freebase. Thus freebase is an identity database with a user contributed schema which spans multiple domains. For example, Arnold Schwarzenegger may appear in a movie database as an actor, a political database as a governor, and in a bodybuilder database as Mr. Universe. In Freebase, however, there is only one topic for Arnold Schwarzenegger that brings all these facets together. The unified topic is a single reconciled identity, which makes it easier to find and contribute information about the linked world we live in. Giannandrea, John. Freebase: an open, writable database of the world’s information (a one-hour lecture delivered in October 2008).
[Amit Singhal] said in a recent interview that the search engine [Google] will better match search queries with a database containing hundreds of millions of “entities”—people, places and things—which the company has quietly amassed in the past two years. Semantic search can help associate different words with one another. Efrati, Mair. Google gives search a refresh. WSJ (15 March 2012).
Knowledge Graph: [W]e’re focused on comprehensive breadth and depth. It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on what people search for, and what we find out on the web. Britt, Phil. Google unveils knowledge graph. (24 May 2012).
Taken together, these and other suggestive developments in the linked-data ecosystem represent a confluence of tools, data, and methodologies of sufficient potential to warrant efforts that pursue:
new opportunities for addressing the traditional and prevailing problems of too many silos of content, too many disparate modes of search and access, and too little precision and too much ambiguity in search results in the extreme environments of academic information resources intended to support and report on the research and teaching in large research enterprises. Keller, Michael A. Linked data: a way out of the information chaos and toward the semantic web. EDUCAUSE Review 42 (4): July/August 2011.
Such opportunities are inextricably bound up with linked-data’s potential for (1) reshaping the infrastructure that supports web-wide management of information, knowledge, and data, and for (2) fueling unprecedented improvements in the efficiency and efficacy of navigation and discovery capabilities. It’s long past being a matter of if, now it’s about when—the game that’s afoot is about finding roles that libraries can play in aiding and abetting the creation of an increasingly dense tapestry of facts and links woven together from the flows of intellectual resources that the global academic community consumes and produces in the course of its research, teaching, and learning.