Short Paper

Enhancing Discovery with AI: Volume Extraction and Summary Statements for Holdings Metadata

Myung-Ja K. Han ORCID,Owen Monroe ORCID

DOI: 10.23106/dcmi.952591060

Abstract

Serials volume information is essential for helping users and collection managers understand what volumes are available and to inform future collection strategies. However, due to historical practices of binding and recording summary statements varying by institution, inconsistent holdings metadata poses significant challenges in aggregated discovery environments. This research explores the use of Large Language Models (LLMs) to enhance holdings metadata through two approaches. The first approach employs a Python script that prompts Gemini AI to extract volume (year) information from title pages in digitized serial PDF files submitted by various institutions. The extracted data is used to generate accurate coverage ranges and identify missing volumes for entire digitized serial contents. The second approach trains a BERT model using labeled data from text files to detect title pages of annual reports and identify publication years present or missing from the digitized serial contents. Both approaches-using Gemini and BERT-have shown measurable success in extracting publication date information and generating summary notes that enhance holdings metadata that would support improved resource navigation and informs strategic collection decisions for digitized serials.

Author information

Myung-Ja K. Han

University of Illinois Urbana-Champaign,US

Owen Monroe

iSchool at University of Illinois,US

Cite this article

Han, M.-J. K., & Monroe, O. (2025). Enhancing Discovery with AI: Volume Extraction and Summary Statements for Holdings Metadata. Proceedings of the International Conference on Dublin Core and Metadata Applications, 2025. https://doi.org/10.23106/dcmi.952591060
Published

Issue

DCMI 2025 Conference Proceedings
Location:
University of Barcelona, Barcelona, Spain
Dates:
October 22-25, 2025
CC-0 Logo Metadata and citations of this article is published under the Creative Commons Zero Universal Public Domain Dedication (CC0), allowing unrestricted reuse. Anyone can freely use the metadata from DCPapers articles for any purpose without limitations.
CC-BY Logo This article full-text is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license allows use, sharing, adaptation, distribution, and reproduction in any medium or format, provided that appropriate credit is given to the original author(s) and the source is cited.