Pisula, Juan I.
ORCID: 0000-0002-6131-8528 and Bozek, Katarzyna
ORCID: 0000-0002-0917-6876
(2025).
Efficient WSI classification with sequence reduction and transformers pretrained on text.
Scientific Reports, 15 (1).
p. 5612.
Springer Nature.
ISSN 2045-2322
|
PDF
s41598-025-88139-5.pdf Bereitstellung unter der CC-Lizenz: Creative Commons Attribution. Download (2MB) |
Abstract
From computer vision to protein fold prediction, Language Models (LMs) have proven successful in transferring their representation of sequential data to a broad spectrum of tasks beyond the domain of natural language processing. Whole Slide Image (WSI) analysis in digital pathology naturally fits to transformer-based architectures. In a pre-processing step analogous to text tokenization, large microscopy images are tessellated into smaller image patches. However, due to the massive size of WSIs comprising thousands of such patches, the problem of WSI classification has not been addressed via deep transformer architectures, let alone via available text-pre-trained deep transformer language models. We introduce SeqShort, a multi-head attention-based sequence shortening layer that summarizes a large WSI into a fixed- and short-sized sequence of feature vectors by removing redundant visual information. Our sequence shortening mechanism not only reduces the computational costs of self-attention on large inputs, it also allows to include standard positional encodings to the previously unordered bag of patches that compose a WSI. We use SeqShort to effectively classify WSIs in different digital pathology tasks using a deep, text pre-trained transformer model while fine-tuning less than 0.1% of its parameters, demonstrating that their knowledge about natural language transfers well to this domain.
| Item Type: | Article |
| Creators: | Creators Email ORCID ORCID Put Code |
| URN: | urn:nbn:de:hbz:38-792445 |
| Identification Number: | 10.1038/s41598-025-88139-5 |
| Journal or Publication Title: | Scientific Reports |
| Volume: | 15 |
| Number: | 1 |
| Page Range: | p. 5612 |
| Date: | 15 February 2025 |
| Publisher: | Springer Nature |
| ISSN: | 2045-2322 |
| Language: | English |
| Faculty: | Central Institutions / Interdisciplinary Research Centers Faculty of Medicine |
| Divisions: | CECAD - Cluster of Excellence Cellular Stress Responses in Aging-Associated Diseases Faculty of Medicine > Medizinische Statistik und Bioinformatik > Institut für Medizinische Statistik und Bioinformatik – IMSB Zentrum für Molekulare Medizin |
| Subjects: | Data processing Computer science Life sciences Medical sciences Medicine |
| ['eprint_fieldname_oa_funders' not defined]: | Publikationsfonds UzK |
| Refereed: | Yes |
| URI: | http://kups.ub.uni-koeln.de/id/eprint/79244 |
Downloads
Downloads per month over past year
Altmetric
Export
Actions (login required)
![]() |
View Item |
https://orcid.org/0000-0002-6131-8528