AI Inference Engineer

AI Inference Engineer

London Full-Time 108000 - 144000 £ / year (est.) No home office possible
Go Premium
P

AI Inference Engineer

We are seeking an AI Inference Engineer to join our expanding team. Our current technology stack includes Python, C++, TensorRT-LLM, and Kubernetes. This role offers the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

  • Develop APIs for AI inference used by internal and external customers
  • Benchmark and optimize the inference stack to address bottlenecks
  • Enhance system reliability and observability, and respond to outages
  • Research and implement optimizations for LLM inference

Qualifications

  • Experience with ML systems and deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX)
  • Knowledge of LLM architectures and inference techniques (e.g., batching, quantization)
  • Experience deploying reliable, distributed, real-time model serving at scale
  • (Optional) Understanding of GPU architectures or CUDA kernel programming

The compensation range for this role is $190,000 – $240,000. Additional benefits include equity, comprehensive health insurance, dental, vision, and a 401(k) plan.

#J-18808-Ljbffr

P

Contact Detail:

Perplexity Recruiting Team

AI Inference Engineer
Perplexity
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

P
  • AI Inference Engineer

    London
    Full-Time
    108000 - 144000 £ / year (est.)
  • P

    Perplexity

    50-100
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>