AI Inference Engineer

AI Inference Engineer

London Full-Time No home office possible
P

AI Inference Engineer

We are seeking an AI Inference Engineer to join our expanding team. Our current technology stack includes Python, C++, TensorRT-LLM, and Kubernetes. This role offers the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

  • Develop APIs for AI inference used by internal and external customers
  • Benchmark and optimize the inference stack to address bottlenecks
  • Enhance system reliability and observability, and respond to outages
  • Research and implement optimizations for LLM inference

Qualifications

  • Experience with ML systems and deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX)
  • Knowledge of LLM architectures and inference techniques (e.g., batching, quantization)
  • Experience deploying reliable, distributed, real-time model serving at scale
  • (Optional) Understanding of GPU architectures or CUDA kernel programming

The compensation range for this role is $190,000 – $240,000. Additional benefits include equity, comprehensive health insurance, dental, vision, and a 401(k) plan.

#J-18808-Ljbffr

P

Contact Detail:

Perplexity Recruiting Team

AI Inference Engineer
Perplexity
P
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>