Unsupervised Text Mining

While AI text mining is not new, this article presents a new development that has important implications for research libraries:

Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., … Jain, A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95–100. https://doi.org/10.1038/s41586-019-1335-8

Of course, it’s from Nature; it’s behind a paywall. Sigh. Hopefully you are able to obtain a copy.

Using unsupervised methods of text mining in the area of materials science, the authors have demonstrated “that latent knowledge regarding future discoveries is to a large extent embedded in past publications.” The discoveries of the future were evident in the literature of past.

Using current and past literature, these approaches “have the potential to unlock latent knowledge not directly accessible to human scientists.”

“Such language-based inference methods can become an entirely new field of research at the intersection between natural language processing and science, going beyond simply extracting entities and numerical values from text and leveraging the collective associations present in the research literature.”

Interestingly, this possibility was explored much earlier during the formative years of MEDLINE albeit with less sophisticated tools:

Swanson, D. R. (1990). Medical literature as a potential source of new knowledge. Bulletin of the Medical Library Association, 78(1), 29–37.

The Tshitoyan et al. research is an exciting development using ML approaches that should become standard tools for research libraries. It is well worth your consideration. It is also, therefore, a concern that this work goes on without any involvement from libraries or those with LIS expertise.