Comprehensive testing of large language models for extraction of structured data in pathology

Grothey, Bastian ORCID: 0000-0002-0883-6481, Odenkirchen, Jan, Brkic, Adnan, Schömig-Markiefka, Birgid ORCID: 0000-0003-1893-8796, Quaas, Alexander ORCID: 0000-0002-3537-6011, Büttner, Reinhard ORCID: 0000-0001-8806-4786 and Tolkach, Yuri ORCID: 0000-0001-5239-2841 (2025). Comprehensive testing of large language models for extraction of structured data in pathology. Communications Medicine, 5 (1). ISSN 2730-664X Open Access

PDF
s43856-025-00808-8.pdf
Bereitstellung unter der CC-Lizenz: Creative Commons Attribution.
Download (4MB)

Identification Number:10.1038/s43856-025-00808-8

Official URL: https://doi.org/10.1038/s43856-025-00808-8

Abstract

Background: Pathology departments generate large volumes of unstructured data as free-text diagnostic reports. Converting these reports into structured formats for analytics or artificial intelligence projects requires substantial manual effort by specialized personnel. While recent studies show promise in using advanced language models for structuring pathology data, they primarily rely on proprietary models, raising cost and privacy concerns. Additionally, important aspects such as prompt engineering and model quantization for deployment on consumer-grade hardware remain unaddressed. Methods: We created a dataset of 579 annotated pathology reports in German and English versions. Six language models (proprietary: GPT-4; open-source: Llama2 13B, Llama2 70B, Llama3 8B, Llama3 70B, and Qwen2.5 7B) were evaluated for their ability to extract eleven key parameters from these reports. Additionally, we investigated model performance across different prompt engineering strategies and model quantization techniques to assess practical deployment scenarios. Results: Here we show that open-source language models extract structured data from pathology reports with high precision, matching the accuracy of proprietary GPT-4 model. The precision varies significantly across different models and configurations. These variations depend on specific prompt engineering strategies and quantization methods used during model deployment. Conclusions: Open-source language models demonstrate comparable performance to proprietary solutions in structuring pathology report data. This finding has significant implications for healthcare institutions seeking cost-effective, privacy-preserving data structuring solutions. The variations in model performance across different configurations provide valuable insights for practical deployment in pathology departments. Our publicly available bilingual dataset serves as both a benchmark and a resource for future research.

Item Type:	Article
Creators:	Creators Email ORCID ORCID Put Code Grothey, Bastian UNSPECIFIED https://orcid.org/0000-0002-0883-6481 UNSPECIFIED Odenkirchen, Jan UNSPECIFIED UNSPECIFIED UNSPECIFIED Brkic, Adnan UNSPECIFIED UNSPECIFIED UNSPECIFIED Schömig-Markiefka, Birgid UNSPECIFIED https://orcid.org/0000-0003-1893-8796 UNSPECIFIED Quaas, Alexander UNSPECIFIED https://orcid.org/0000-0002-3537-6011 UNSPECIFIED Büttner, Reinhard UNSPECIFIED https://orcid.org/0000-0001-8806-4786 UNSPECIFIED Tolkach, Yuri UNSPECIFIED https://orcid.org/0000-0001-5239-2841 UNSPECIFIED
URN:	urn:nbn:de:hbz:38-792734
Identification Number:	10.1038/s43856-025-00808-8
Journal or Publication Title:	Communications Medicine
Volume:	5
Number:	1
Date:	31 March 2025
ISSN:	2730-664X
Language:	English
Faculty:	Faculty of Medicine
Divisions:	Faculty of Medicine > Anatomie Faculty of Medicine > Pathologie und Neuropathologie > Institut für Pathologie
Subjects:	Medical sciences Medicine
['eprint_fieldname_oa_funders' not defined]:	Publikationsfonds UzK
Refereed:	Yes
URI:	http://kups.ub.uni-koeln.de/id/eprint/79273

Downloads

Downloads per month over past year

Altmetric

Export

Actions (login required)

View Item

Universität zu Köln

Kölner UniversitätsPublikationsServer

Abstract

Downloads

Altmetric

Export

Actions (login required)