Senior AI Infrastructure Engineer (vLLM / GPU / Kubernetes)

Full-Time No home office possible

Infra Agnostic Sovereign AI Cloud Platform | Avoid vendor lock-in | HPC | Join the Alpha Program

Company Description

At stack8s, we’re building a Sovereign Cloud Platform that empowers organizations to deploy, scale, and optimize AI workloads across any infrastructure — on-prem, hybrid, or multi‑cloud. Our platform integrates GPUs, Kubernetes, and open‑source intelligence to deliver cost‑efficient, privacy‑preserving, and infrastructure‑agnostic compute for enterprises, research institutes, and governments.

We’re looking for a Senior AI Infrastructure Engineer with deep expertise in vLLM, GPU virtualization (vGPU, MIG), and large‑scale Kubernetes clusters, to join our growing team. This role will shape the next generation of AI orchestration and GPU scheduling at scale across sovereign data environments.

Role Overview

You will lead the design, deployment, and optimization of GPU‑accelerated LLM inference and training pipelines using vLLM, NVIDIA GPU Operator, and MIG/vGPU configurations. You’ll work on multi‑tenant Kubernetes clusters across bare metal and cloud providers (AWS, Azure, GCP, Vultr, OVH, etc.) ensuring low‑latency, high‑throughput, and efficient GPU utilization for LLMs and AI workloads.

Key Responsibilities

Architect and optimize vLLM and LLM‑D deployments for large‑scale inference and fine‑tuning workloads.
Design and manage MIG/vGPU configurations across multi‑cluster GPU environments (H100, A100, L40S, etc.).
Integrate NVIDIA GPU Operator, KubeVirt, and device plugin frameworks to support hybrid GPU scheduling.
Build Helm‑based deployment pipelines and automate GPU provisioning using GitOps tools (ArgoCD, Flux).
Collaborate with platform and backend teams to improve observability, scaling, and fault tolerance for GPU workloads.
Evaluate and optimize Kubernetes scheduling for LLM workloads (e.g., with Volcano, Kueue, Kubeflow).
Participate in benchmarking and performance tuning of multi‑cloud GPU clusters and sovereign compute zones.
Contribute to the evolution of Stack8s’ infra‑agnostic architecture, ensuring compliance, sovereignty, and performance.

Required Skills & Experience

5+ years of experience with Kubernetes administration and GPU infrastructure at scale.
Deep understanding of vLLM, LLM inference optimizations, and transformer model deployment.
Hands‑on experience with NVIDIA GPU Operator, MIG profiles, vGPU management, and CUDA/cuDNN optimization.
Strong proficiency with Helm, Terraform, and container orchestration on bare metal and cloud.
Experience in multi‑cluster networking (Cilium, Calico, EVPN/VXLAN) and multi‑tenant Kubernetes environments.
Proficient in Linux kernel‑level GPU debugging, container runtime configuration, and device plugin customization.
Knowledge of ML frameworks (PyTorch, TensorRT, Hugging Face, DeepSpeed) and vector DBs / serving layers (vLLM, Triton).
Familiar with GitOps, Prometheus/Grafana, and ClickHouse/ELK observability stacks.
Strong scripting skills (Python, Bash, Go preferred).

Nice to Have

CKA / CKS / CKAD Kubernetes certifications.
Experience with KubeFlow, Ray, or MLRun for distributed AI pipelines.
Exposure to OpenStack, OpenShift, Proxmox, or Harvester for on‑prem virtualization.
Knowledge of sovereign cloud regulations (UK/EU data residency, GDPR, ISO‑27001).
Experience contributing to open‑source GPU or orchestration projects.

Why Join Stack8s

Build a sovereign, infra‑agnostic AI platform redefining multi‑cloud compute.
Work with GPU‑rich HPC clusters and LLM‑native workloads in production.
Collaborate with leading research institutes (e.g., Imperial College London) and global partners.
Be part of a diverse, high‑autonomy startup with a clear mission and deep technical culture.
Competitive compensation, equity options, and global flexibility.

Seniority level

Mid‑Senior level

Employment type

Full‑time

Job function

Information Technology

Industries

Data Infrastructure and Analytics

#J-18808-Ljbffr

Contact Detail:

stack8s Recruiting Team

View stack8s Profile

Senior AI Infrastructure Engineer (vLLM / GPU / Kubernetes)

stack8s

Senior AI Infrastructure Engineer (vLLM / GPU / Kubernetes)

Infra Agnostic Sovereign AI Cloud Platform | Avoid vendor lock-in | HPC | Join the Alpha Program

Seniority level

Employment type

Job function

Industries

Senior AI Infrastructure Engineer (vLLM / GPU / Kubernetes)

Land your dream job quicker with Premium