Short Paper
Enhancing Discovery with AI: Volume Extraction and Summary Statements for Holdings Metadata
Abstract
Serials volume information is essential for helping users and collection managers understand what volumes are available and to inform future collection strategies. However, due to historical practices of binding and recording summary statements varying by institution, inconsistent holdings metadata poses significant challenges in aggregated discovery environments. This research explores the use of Large Language Models (LLMs) to enhance holdings metadata through two approaches. The first approach employs a Python script that prompts Gemini AI to extract volume (year) information from title pages in digitized serial PDF files submitted by various institutions. The extracted data is used to generate accurate coverage ranges and identify missing volumes for entire digitized serial contents. The second approach trains a BERT model using labeled data from text files to detect title pages of annual reports and identify publication years present or missing from the digitized serial contents. Both approaches-using Gemini and BERT-have shown measurable success in extracting publication date information and generating summary notes that enhance holdings metadata that would support improved resource navigation and informs strategic collection decisions for digitized serials.
Author information
Cite this article
- Published
Issue
- Location:
- University of Barcelona, Barcelona, Spain
- Dates:
- October 22-25, 2025