AI Inference Engineer

London Full-Time 108000 - 144000 £ / year (est.) No home office possible

AI Inference Engineer

We are seeking an AI Inference Engineer to join our expanding team. Our current technology stack includes Python, C++, TensorRT-LLM, and Kubernetes. This role offers the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

Develop APIs for AI inference used by internal and external customers
Benchmark and optimize the inference stack to address bottlenecks
Enhance system reliability and observability, and respond to outages
Research and implement optimizations for LLM inference

Qualifications

Experience with ML systems and deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX)
Knowledge of LLM architectures and inference techniques (e.g., batching, quantization)
Experience deploying reliable, distributed, real-time model serving at scale
(Optional) Understanding of GPU architectures or CUDA kernel programming

The compensation range for this role is $190,000 – $240,000. Additional benefits include equity, comprehensive health insurance, dental, vision, and a 401(k) plan.

#J-18808-Ljbffr