AI Inference Engineer

AI Inference Engineer

London Full-Time 108000 - 144000 £ / year (est.) No home office possible
P

AI Inference Engineer We are seeking an AI Inference Engineer to join our expanding team. Our current technology stack includes Python, C++, TensorRT-LLM, and Kubernetes. This role offers the opportunity to work on large-scale deployment of machine learning models for real-time inference. Responsibilities Develop APIs for AI inference used by internal and external customers Benchmark and optimize the inference stack to address bottlenecks Enhance system reliability and observability, and respond to outages Research and implement optimizations for LLM inference Qualifications Experience with ML systems and deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX) Knowledge of LLM architectures and inference techniques (e.g., batching, quantization) Experience deploying reliable, distributed, real-time model serving at scale (Optional) Understanding of GPU architectures or CUDA kernel programming The compensation range for this role is $190,000 – $240,000. Additional benefits include equity, comprehensive health insurance, dental, vision, and a 401(k) plan. #J-18808-Ljbffr

P

Contact Detail:

Perplexity Recruiting Team

AI Inference Engineer
Perplexity
P
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>