“Hey Google, what do you know about virtual assistants and libraries?”

Who’s staffing the reference desk or the library chatlines? These days, or in the near future, it might be Google Assistant, Alexa, Cortana, or Siri. Library users may increasingly turn to virtual or personal assistants before they interact with specific library services. And why not, they appear to be getting quite good. 

In 2018 Perficient Digital tested Alexa, Cortana, Google Assistant and Siri with nearly 5,000 questions :

Comparing Digital Personal Assistants

BTW, they also tested which assistant was the funniest by tracking the jokes they made in response to some questions. “What is the meaning of life?” Siri: “All evidence to date suggests it’s chocolate.”

Results like this intrigued Amanda Wheatley and Sandy Hervieux of the McGill University Library. As a result, they initiated a multi-phase research project to explore the awareness of AI among libraries and librarians, their use of this technology, and what their expectations are for the future.

Amanda Wheatley and Sandy Hervieux

They believe AI will “change the nature of our work but won’t take our jobs.” AI will not displace librarians and library staff but operate as “an immersive environment where we coexist.” From their perspective “AI is not one thing” but an array of options and opportunities to be used in thoughtful ways. However, it is time to be proactive not reactive; we should lead in the use of this technology not be used by it.

Phase 1 (completed): an environmental scan of libraries and their use of AI as indicated in strategy plans or other documentation. The result? Not too much happening. This could be a lack of funds for technology innovation or it might be a concern about the nature of the technology.

Phase 2 (in process): a broad survey of libraries and librarians to assess their awareness and expectations of AI. That survey is currently live. The deadline for responses is September 6, 2019. You are encouraged to participate!

Phase 3 (in process): testing various devices with sample reference questions. The first test pitted Google against Siri with Google a clear winner. It responded by summarizing information, presenting relevant graphs and charts, and providing credible research materials … “it was terrifying!”. They are now starting to work with the Alexa Skills Kit to teach Alexa new library skills.

Phase 4 (planned): an AI experience in the McGill libraries to give the community a hands-on opportunity to explore the technology. 

If you want more information about their work, visit their guide to the project or contact them via email: Amanda Wheatley (amanda.wheatley@mcgill.ca) or Sandy Hervieux (sandy.hervieux@mcgill.ca).

Amanda and Sandy are editing a book for ACRL on the use of AI in libraries. A call for chapters will go out in the fall.

Lots of interesting research to follow. Looking forward to hearing about their progress.


Unsupervised Text Mining

While AI text mining is not new, this article presents a new development that has important implications for research libraries:

Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., … Jain, A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95–100. https://doi.org/10.1038/s41586-019-1335-8

Of course, it’s from Nature; it’s behind a paywall. Sigh. Hopefully you are able to obtain a copy.

Using unsupervised methods of text mining in the area of materials science, the authors have demonstrated “that latent knowledge regarding future discoveries is to a large extent embedded in past publications.” The discoveries of the future were evident in the literature of past.

Using current and past literature, these approaches “have the potential to unlock latent knowledge not directly accessible to human scientists.”

“Such language-based inference methods can become an entirely new field of research at the intersection between natural language processing and science, going beyond simply extracting entities and numerical values from text and leveraging the collective associations present in the research literature.”

Interestingly, this possibility was explored much earlier during the formative years of MEDLINE albeit with less sophisticated tools:

Swanson, D. R. (1990). Medical literature as a potential source of new knowledge. Bulletin of the Medical Library Association, 78(1), 29–37.

The Tshitoyan et al. research is an exciting development using ML approaches that should become standard tools for research libraries. It is well worth your consideration. It is also, therefore, a concern that this work goes on without any involvement from libraries or those with LIS expertise.


An AI-Authored Scholarly Book

Earlier this year Springer Nature published an open access book written by AI: Lithium-Ion Batteries: A Machine-Generated Summary of Current Research. The author is identified as “Beta Writer”.

Beta Writer algorithmically categorized and summarized more than 150 key research publications selected from over 1,000 published from 2016 to 2018. I’m no expert on lithium-ion batteries so others will have to weigh in on whether this is a credible book . However, a book that synthesizes and summarizes a large and complex corpus of current research literature is a valuable contribution.

The process of the book, a combination of various “off the shelf” natural language processing (NLP) tools, preprocesses the documents to address various linguistic and semantic normalizations, clusters documents by content similarity (i.e. chapters and sections of the book), generates abstracts, summaries, introductions, and conclusions, and outputs XML as a final manuscript. And it does so in a manner that is sensitive to copyright infringements. The details are outlined in a human written Preface (Henning Schoenenberger, Christian Chiarcos, and Niko Schenk) and provide an interesting comparison to current cataloguing and metadata processes and theories.

Book Production Workflow

In an interview published in The Scholarly Kitchen, Schoenenberger was clear that the intent is “to initiate a public debate on the opportunities, implications and potential risks of machine-generated content in scholarly publishing.” This book is far from perfect and Springer acknowledges that. Commendably, Springer has gone to great lengths to document their process, discuss alternative strategies, identify weaknesses and outright failures, and to encourage critical commentary.

We foresee that in future there will be a wide range of options to create content – from entirely human-created content, a variety of blended man-machine text generation to entirely machine-generated text.

Henning Schoenenberger, Director Product Data & Metadata Management at Springer Nature

Future projects will have an “emphasis on an interdisciplinary approach, acknowledging how difficult it often is to keep an overview across the disciplines.” This is intriguing given the importance of interdisciplinarity and the challenges of tracking concepts in new, unfamiliar fields.

Reviewers of the book argue that it’s not actually a book because it lacks a narrative, a integrating storyline. Agreed. But frankly our definition of “a book” has always been, and remains, fairly elastic. So, it’s a book; just a different book. And it’s a very interesting book at that.


Welcome to Library AI

Algorithmic decision-making arising from machine learning is ubiquitous, powerful, often opaque, sometimes invisible, and (most importantly) consequential in our everyday lives.

Machine learning (ML) is critically important for libraries because it offers new tools for knowledge organization and knowledge discovery. It also, however, presents significant challenges with respect to fairness, accountability, and transparency.

I believe that artificial intelligence will become a major human rights issue in the twenty-first century.

Safiya Noble (2018). Algorithms of Oppression.

This blog will attempt to chart ML developments and issues in libraries and to identify trends in the wider AI community that impact libraries.

“The danger is not so much in delegating cognitive tasks, but in distancing ourselves from – or in not knowing about – the nature and precise mechanisms of that delegation”

de Mul & van den Berg (2011). Remote control: Human autonomy in the age of computer-mediated agency.

Libraries have often been instrumental in championing new technologies and making them more accessible. As we adopt and develop ML tools and services, something I think is an imperative if we are to advance our mission, we also need to be aware of the emerging “new digital divide”:

A class of people who can use algorithms and a class used by algorithms.

David Lankes (Director, SLIS, Univ. of Southern Carolina).

Looking forward to this journey. Let me know what you think.