Trappen, Tim
ORCID: 0009-0004-4502-1777, Keßler, Robert
ORCID: 0009-0000-4625-811X, Pabel, Roland
ORCID: 0009-0008-0013-067X, Achter, Viktor
ORCID: 0000-0002-3813-0746 and Wesner, Stefan
ORCID: 0000-0002-7270-7959
(2025).
Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM.
['eprint_fieldopt_monograph_type_preprint' not defined].
|
PDF
paper.pdf - Accepted Version Bereitstellung unter der CC-Lizenz: Creative Commons Attribution Share Alike. Download (344kB) |
Abstract
Due to rising demands for Artificial Inteligence (AI) inference, especially in higher education, novel solutions utilising existing infrastructure are emerging. The utilisation of High-Performance Computing (HPC) has become a prevalent approach for the implementation of such solutions. However, the classical operating model of HPC does not adapt well to the requirements of synchronous, user-facing dynamic AI application workloads. In this paper, we propose our solution that serves LLMs by integrating vLLM, Slurm and Kubernetes on the supercomputer \textit{RAMSES}. The initial benchmark indicates that the proposed architecture scales efficiently for 100, 500 and 1000 concurrent requests, incurring only an overhead of approximately 500 ms in terms of end-to-end latency.
| Item Type: | Monograph (['eprint_fieldopt_monograph_type_preprint' not defined]) |
| Creators: | Creators Email ORCID ORCID Put Code |
| URN: | urn:nbn:de:hbz:38-792833 |
| Journal or Publication Title: | Next-Gen Middleware for MLOps in Distributed Systems |
| Number of Pages: | 6 |
| Date: | November 2025 |
| Language: | English |
| Faculty: | Faculty of Mathematics and Natural Sciences |
| Divisions: | Faculty of Mathematics and Natural Sciences > Department of Mathematics and Computer Science > Institute of Computer Science |
| Subjects: | Data processing Computer science |
| Funders: | Ministry of Culture and Science of the State of North Rhine-Westphalia |
| Projects: | Open Source-KI.nrw |
| Refereed: | Yes |
| URI: | http://kups.ub.uni-koeln.de/id/eprint/79283 |
Downloads
Downloads per month over past year
Export
Actions (login required)
![]() |
View Item |
https://orcid.org/0009-0004-4502-1777