Trappen, Tim ORCID: 0009-0004-4502-1777, Keßler, Robert ORCID: 0009-0000-4625-811X, Pabel, Roland ORCID: 0009-0008-0013-067X, Achter, Viktor ORCID: 0000-0002-3813-0746 and Wesner, Stefan ORCID: 0000-0002-7270-7959 (2025). Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM. ['eprint_fieldopt_monograph_type_preprint' not defined].

[thumbnail of paper.pdf] PDF
paper.pdf - Accepted Version
Bereitstellung unter der CC-Lizenz: Creative Commons Attribution Share Alike.

Download (344kB)

Abstract

Due to rising demands for Artificial Inteligence (AI) inference, especially in higher education, novel solutions utilising existing infrastructure are emerging. The utilisation of High-Performance Computing (HPC) has become a prevalent approach for the implementation of such solutions. However, the classical operating model of HPC does not adapt well to the requirements of synchronous, user-facing dynamic AI application workloads. In this paper, we propose our solution that serves LLMs by integrating vLLM, Slurm and Kubernetes on the supercomputer \textit{RAMSES}. The initial benchmark indicates that the proposed architecture scales efficiently for 100, 500 and 1000 concurrent requests, incurring only an overhead of approximately 500 ms in terms of end-to-end latency.

Item Type: Monograph (['eprint_fieldopt_monograph_type_preprint' not defined])
Creators:
Creators
Email
ORCID
ORCID Put Code
Trappen, Tim
tim.trappen@ruhr-uni-bochum.de
UNSPECIFIED
Keßler, Robert
kessler@uni-koeln.de
UNSPECIFIED
Pabel, Roland
pabel@uni-koeln.de
UNSPECIFIED
Achter, Viktor
achter@uni-koeln.de
UNSPECIFIED
Wesner, Stefan
wesner@uni-koeln.de
UNSPECIFIED
URN: urn:nbn:de:hbz:38-792833
Journal or Publication Title: Next-Gen Middleware for MLOps in Distributed Systems
Number of Pages: 6
Date: November 2025
Language: English
Faculty: Faculty of Mathematics and Natural Sciences
Divisions: Faculty of Mathematics and Natural Sciences > Department of Mathematics and Computer Science > Institute of Computer Science
Subjects: Data processing Computer science
Funders: Ministry of Culture and Science of the State of North Rhine-Westphalia
Projects: Open Source-KI.nrw
Refereed: Yes
URI: http://kups.ub.uni-koeln.de/id/eprint/79283

Downloads

Downloads per month over past year

Export

Actions (login required)

View Item View Item