AI Inference Engineer

AI Inference Engineer

London Full-Time 114000 - 144000 £ / year (est.) No home office possible
Pantera Capital

At a Glance

  • Tasks: Join us as an AI Inference Engineer, developing APIs and optimising machine learning models.
  • Company: Perplexity is a rapidly growing tech company revolutionising AI with over 10 million active users.
  • Benefits: Enjoy competitive salary, equity options, and comprehensive health benefits including dental and vision.
  • Why this job: Be part of a cutting-edge team impacting millions globally with innovative AI solutions.
  • Qualifications: Experience with ML systems, deep learning frameworks, and knowledge of LLM architectures required.
  • Other info: Flexible work environment with opportunities for growth in a billion-dollar valued startup.

The predicted salary is between 114000 - 144000 £ per year.

Location

London

Employment Type

Full time

Department

AI

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations

Qualifications

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Final offer amounts are determined by multiple factors, including, experience and expertise.

Equity: In addition to the base salary, equity may be part of the total compensation package.

#J-18808-Ljbffr

AI Inference Engineer employer: Pantera Capital

At Perplexity, we pride ourselves on being an innovative leader in AI technology, offering our employees a dynamic work environment that fosters creativity and collaboration. With competitive compensation packages, including equity options, comprehensive health benefits, and a strong focus on professional development, we empower our team to grow alongside our rapidly expanding company. Join us in our vibrant location, where you can contribute to cutting-edge projects that impact millions of users globally.
Pantera Capital

Contact Detail:

Pantera Capital Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land AI Inference Engineer

✨Tip Number 1

Familiarise yourself with our technology stack, especially Python and C++. Having hands-on experience with these languages will give you a significant edge during discussions and technical assessments.

✨Tip Number 2

Dive deep into LLM architectures and inference optimisation techniques. Understanding concepts like batching and quantisation will not only help you in the role but also impress us during your interviews.

✨Tip Number 3

Showcase any projects or experiences where you've deployed scalable, real-time model serving systems. Real-world examples of your work can set you apart from other candidates.

✨Tip Number 4

Engage with the AI community through forums or social media. Networking can provide insights into industry trends and may even lead to referrals, increasing your chances of landing an interview with us.

We think you need these skills to ace AI Inference Engineer

Proficiency in Python and C++
Experience with TensorRT-LLM
Knowledge of Kubernetes
Familiarity with machine learning systems
Deep learning frameworks expertise (PyTorch, TensorFlow, ONNX)
Understanding of LLM architectures
Inference optimisation techniques (e.g., batching, quantization)
Experience in deploying scalable model serving systems
System reliability and observability management
Benchmarking and performance optimisation skills
Problem-solving skills in real-time inference contexts
GPU architecture knowledge or CUDA programming experience

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights relevant experience with machine learning systems, deep learning frameworks like PyTorch and TensorFlow, and any work you've done with LLM architectures. Use keywords from the job description to catch the employer's attention.

Craft a Strong Cover Letter: In your cover letter, express your enthusiasm for the role and the company. Discuss specific projects where you've developed APIs for AI inference or optimised inference stacks, showcasing your problem-solving skills and technical expertise.

Showcase Relevant Projects: If you have worked on projects involving real-time model serving systems or have experience with CUDA programming, be sure to include these in your application. Provide links to your GitHub or portfolio to demonstrate your hands-on experience.

Highlight Continuous Learning: Mention any recent courses, certifications, or workshops related to AI, ML, or deep learning that you've completed. This shows your commitment to staying updated in a rapidly evolving field, which is crucial for an AI Inference Engineer.

How to prepare for a job interview at Pantera Capital

✨Showcase Your Technical Skills

Be prepared to discuss your experience with Python, C++, and deep learning frameworks like PyTorch and TensorFlow. Highlight specific projects where you've implemented machine learning models or optimised inference processes.

✨Understand the Company’s Technology Stack

Familiarise yourself with TensorRT-LLM and Kubernetes, as these are key components of the role. Demonstrating knowledge about how these technologies work together will show your genuine interest in the position.

✨Prepare for Problem-Solving Questions

Expect questions that assess your ability to troubleshoot and optimise AI inference systems. Think of examples where you've successfully identified bottlenecks and implemented solutions, particularly in real-time model serving.

✨Research LLM Architectures

Since knowledge of LLM architectures is crucial, brush up on the latest techniques in inference optimisation, such as batching and quantisation. Being able to discuss these topics will set you apart from other candidates.

AI Inference Engineer
Pantera Capital
Location: London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>