Staff ML Performance Engineer (Inference Optimisation)

Staff ML Performance Engineer (Inference Optimisation)

Full-Time 70000 - 90000 € / year (est.) No home office possible
Wayve

At a Glance

  • Tasks: Optimise ML inference for edge devices and contribute to high-impact projects.
  • Company: Join Wayve, a pioneering tech company focused on self-driving technology.
  • Benefits: Enjoy a hybrid work model, competitive salary, and opportunities for professional growth.
  • Other info: Inclusive culture with a commitment to diversity and career development.
  • Why this job: Make a real impact in the future of driving with cutting-edge technology.
  • Qualifications: Experience in performance optimisation and strong software engineering skills required.

The predicted salary is between 70000 - 90000 € per year.

The role involves optimising ML inference for edge accelerators and GPUs, focusing on running large transformer-based models efficiently on low-cost, low-power edge devices to enable Wayve’s first driving product. You will help set the technical direction for turning these models into production systems that run reliably on in-vehicle compute. This is a hands-on role working across ML systems, compilers, runtimes, kernels, and embedded deployment, contributing to several early-stage, high-impact projects at Wayve.

Key Responsibilities

  • Profile and pinpoint bottlenecks across the full inference stack (model graph, compiler/runtime, kernel execution, memory movement) and deliver measurable improvements.
  • Implement and validate optimisations in compilers, runtimes, and/or kernels (e.g. operator fusion, scheduling, quantisation-aware performance, custom kernels).
  • Build robust benchmarking and regression testing to ensure performance improvements hold across models, devices, and software releases.
  • Optimise for multiple targets (e.g. NVIDIA Orin/Thor, Qualcomm) and work with teams to support these in a maintainable way.
  • Collaborate with model developers to influence architecture and training/deployment decisions that affect on-device performance.
  • Contribute to technical roadmaps and tooling and help raise the standard of performance engineering across the team.

About you

Essential

  • Proven experience improving performance in production systems with tight constraints (latency, memory, bandwidth, power/thermal, or cost).
  • Strong proficiency with at least one relevant stack/toolchain (e.g. TensorRT, CUDA, Qualcomm QNN, Triton, OpenCL) and confidence learning adjacent frameworks quickly.
  • Comfort operating at multiple levels of abstraction — from high-level model behaviour down to low-level kernel/runtime execution.
  • Strong software engineering fundamentals (debugging, profiling, testing, and maintainable code).
  • Clear communicator and collaborative teammate; able to align multiple stakeholders on performance trade-offs and priorities.

Desirable

  • Exposure to embedded or edge deployment of ML models, including benchmarking on real devices and handling system-level constraints.
  • Experience with NVIDIA and/or Qualcomm SoCs and performance tooling.
  • Python and C++ proficiency.
  • Experience mentoring others and/or driving technical direction in a small, fast-moving team.

This is a full-time role based in our office in London. We operate a hybrid working policy that combines time in the office and time working from home. Wayve is committed to an inclusive interview experience. If you require accommodations or adjustments to participate fully in our interview process, please let us know. We understand that not everyone will meet all of the requirements listed above. If you’re passionate about self-driving cars and think you have what it takes to make a positive impact, we encourage you to apply.

Staff ML Performance Engineer (Inference Optimisation) employer: Wayve

Wayve is an exceptional employer for those passionate about cutting-edge technology and self-driving cars, offering a dynamic work culture that fosters collaboration and innovation. With a hybrid working policy based in London, employees benefit from a supportive environment that prioritises inclusivity and personal growth, alongside opportunities to work on high-impact projects that shape the future of autonomous driving. Join us to be part of a team that values your contributions and encourages you to push the boundaries of machine learning performance engineering.

Wayve

Contact Detail:

Wayve Recruiting Team

StudySmarter Expert Advice🤫

We think this is how you could land Staff ML Performance Engineer (Inference Optimisation)

Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.

Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to ML performance engineering. This gives potential employers a taste of what you can do beyond your CV.

Tip Number 3

Prepare for interviews by practising common technical questions and scenarios relevant to the role. Think about how you’d optimise inference for edge devices and be ready to discuss your thought process.

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team at Wayve.

We think you need these skills to ace Staff ML Performance Engineer (Inference Optimisation)

ML Inference Optimisation
Performance Engineering
TensorRT
CUDA
Qualcomm QNN
Triton
OpenCL

Some tips for your application 🫡

Tailor Your CV:Make sure your CV is tailored to the Staff ML Performance Engineer role. Highlight your experience with performance optimisation, especially in production systems, and any relevant tools like TensorRT or CUDA. We want to see how your skills align with our needs!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you're passionate about self-driving cars and how your background makes you a great fit for our team. Be sure to mention specific projects or experiences that relate to the job description.

Showcase Your Technical Skills:In your application, don’t forget to showcase your technical skills clearly. Mention your proficiency in Python and C++, and any experience with embedded or edge deployment of ML models. We love seeing candidates who can operate at multiple levels of abstraction!

Apply Through Our Website:We encourage you to apply through our website for the best experience. It’s straightforward and ensures your application gets to the right people. Plus, it shows us you’re keen on joining our team at Wayve!

How to prepare for a job interview at Wayve

Know Your Tech Stack

Make sure you’re well-versed in the relevant stacks and toolchains like TensorRT, CUDA, or Qualcomm QNN. Brush up on how these tools can optimise ML inference, as you'll likely be asked to discuss your experience with them during the interview.

Showcase Your Problem-Solving Skills

Prepare to discuss specific examples where you've identified and resolved performance bottlenecks in production systems. Highlight your approach to optimising for constraints like latency and memory, as this will demonstrate your hands-on experience and technical direction.

Communicate Clearly

Since collaboration is key in this role, practice articulating your thoughts clearly. Be ready to explain complex concepts in a way that aligns multiple stakeholders on performance trade-offs. This will show that you can be an effective team player.

Prepare for Real-World Scenarios

Think about how you would handle benchmarking on real devices and managing system-level constraints. Be prepared to discuss your experience with embedded or edge deployment of ML models, as this could set you apart from other candidates.