A leading AI business is hiring an AI Infrastructure Engineer who has experience with GPU, Distributed Systems & AI Platforms. Hybrid/Remote options available. Outside IR35. Paying between £800 to £1200 per day.
Experience and skills required for the AI Infrastructure Engineer, GPU, Distributed Systems & AI Platforms
- Strong systems-level engineering experience, ideally in infrastructure, HPC, platform engineering or AI/ML environments
- Hands-on experience operating large-scale compute or GPU-backed infrastructure
- Experience with distributed systems and multi-node environments
- Familiarity with NCCL and GPU-to-GPU communication
- Experience with Kubernetes, containerised platforms and cluster orchestration
- Strong coding ability in Python, Go or C++
- Experience working with high-performance storage across complex environments is highly desirable
- A strong troubleshooting mindset with the ability to understand behaviour at cluster, hardware and network level
- Exposure to InfiniBand, bare-metal provisioning or HPC-style networking
- Experience supporting training or inference environments for large-scale ML models
- Background in AI infrastructure start-ups, hyperscalers or high-performance compute environments
- Experience with profiling / benchmarking tools and performance optimisation at scale
- Build, operate and optimise large-scale GPU infrastructure for AI training and inference
- Support multi-node, multi-GPU environments and distributed workloads
- Improve cluster health, fault tolerance and remediation workflows across GPU fleets
- Optimise GPU-to-GPU communication, workload performance and infrastructure utilisation
- Work with high-performance storage systems supporting large datasets and checkpointing
- Build or improve tooling for profiling, monitoring, benchmarking and performance analysis
- Collaborate closely with ML researchers, platform teams and infrastructure engineers to remove bottlenecks and improve training efficiency
- Support capacity planning and deployment for next-generation compute environments
- Outside IR35
- Hybrid position
- Paying up to £1200 per day
AMRT1_UKTJ