ML Systems/Infrastructure Engineer
ML Systems/Infrastructure Engineer

ML Systems/Infrastructure Engineer

Full-Time 36000 - 60000 ÂŁ / year (est.) No home office possible
O

At a Glance

  • Tasks: Join a dynamic team to optimise AI/ML software with cutting-edge hardware.
  • Company: Oriole, a forward-thinking tech company in London.
  • Benefits: Competitive salary, flexible work options, and growth opportunities.
  • Why this job: Make a real impact on advanced AI/ML projects and technology.
  • Qualifications: Proficient in C++ and Python, with GPU programming experience.
  • Other info: Exciting environment with a focus on innovation and collaboration.

The predicted salary is between 36000 - 60000 ÂŁ per year.

Join Oriole as a ML Systems/Infrastructure Engineer to co‑optimize our AI/ML software stack with cutting‑edge network hardware. You will be a key contributor to a high‑impact, agile team focused on integrating middleware communication libraries and modeling the performance of large‑scale AI/ML workloads.

Key Responsibilities

  • Design and optimize custom GPU communication kernels to enhance performance and scalability across multi‑node environments.
  • Develop and maintain distributed communication frameworks for large‑scale deep learning models, ensuring efficient parallelization and optimal resource utilization.
  • Profile, benchmark, and debug GPU applications to identify and resolve bottlenecks in communication and computation pipelines.
  • Collaborate closely with hardware and software teams to integrate optimized kernels with Oriole’s next‑generation network hardware and software stack.
  • Contribute to system‑level architecture decisions for large‑scale GPU clusters, focusing on communication efficiency, fault tolerance, and novel architectures for advanced optical network infrastructure.

Required Skills & Experience

  • Proficient in C++ and Python, with a strong track record in high‑performance computing or machine learning projects.
  • Expertise in GPU programming with CUDA, including deep knowledge of GPU memory hierarchies and kernel optimization.
  • Hands‑on experience debugging GPU kernels using tools such as cuda‑gdb, cuda‑memcheck, NSight Systems, PTX, and SASS.
  • Strong understanding of communication libraries and protocols, including NCCL, NVSHMEM, OpenMPI, UCX, or custom collective communication implementations.
  • Familiarity with HPC networking protocols/libraries such as RoCE, InfiniBand, libibverbs, and libfabric.
  • Experience with distributed deep learning / MoE frameworks such as PyTorch Distributed, vLLM, or DeepEP.
  • Solid understanding of deploying and optimizing large‑scale distributed deep learning workloads in production environments, including Linux, Kubernetes, SLURM, OpenMPI, GPU drivers, Docker, and CI/CD automation.

Additional Details

Full‑time, Mid‑Senior level. Employment based in London, England, United Kingdom.

ML Systems/Infrastructure Engineer employer: Oriole

At Oriole, we pride ourselves on being an exceptional employer, offering a dynamic work culture that fosters innovation and collaboration. As a ML Systems/Infrastructure Engineer in London, you will have access to cutting-edge technology and the opportunity to work alongside industry experts, ensuring your professional growth while contributing to impactful projects in AI/ML. Our commitment to employee development, coupled with a supportive environment, makes Oriole a fantastic place to advance your career in a thriving tech hub.
O

Contact Detail:

Oriole Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land ML Systems/Infrastructure Engineer

✨Tip Number 1

Network, network, network! Get out there and connect with folks in the industry. Attend meetups, webinars, or even just grab a coffee with someone who works at Oriole. You never know who might put in a good word for you!

✨Tip Number 2

Show off your skills! If you've got a GitHub or portfolio showcasing your projects, make sure to share it during interviews. It’s a great way to demonstrate your expertise in C++, Python, and GPU programming.

✨Tip Number 3

Prepare for technical interviews by brushing up on your knowledge of communication libraries and protocols. Practice coding challenges related to GPU programming and distributed systems to impress the interviewers.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen. Plus, if you have any connections at Oriole, let them know you’ve applied – referrals can really boost your chances!

We think you need these skills to ace ML Systems/Infrastructure Engineer

C++
Python
High-Performance Computing
Machine Learning
GPU Programming
CUDA
GPU Memory Hierarchies
Kernel Optimization
Debugging GPU Kernels
NCCL
NVSHMEM
OpenMPI
UCX
Distributed Deep Learning
Linux
Kubernetes
SLURM
Docker
CI/CD Automation

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the ML Systems/Infrastructure Engineer role. Highlight your experience with C++, Python, and GPU programming, as well as any relevant projects you've worked on. We want to see how your skills align with our needs!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about AI/ML and how your background makes you a perfect fit for Oriole. Don’t forget to mention specific experiences that relate to the job description.

Showcase Your Projects: If you've worked on any high-performance computing or machine learning projects, make sure to showcase them in your application. We love seeing real-world applications of your skills, especially those involving GPU programming and distributed systems.

Apply Through Our Website: We encourage you to apply through our website for the best chance of getting noticed. It helps us keep track of applications and ensures you’re considered for the role. Plus, it’s super easy to do!

How to prepare for a job interview at Oriole

✨Know Your Tech Inside Out

Make sure you’re well-versed in C++ and Python, as well as GPU programming with CUDA. Brush up on your knowledge of communication libraries like NCCL and OpenMPI, and be ready to discuss your hands-on experience with debugging tools like cuda-gdb and NSight Systems.

✨Showcase Your Problem-Solving Skills

Prepare to discuss specific challenges you've faced in high-performance computing or machine learning projects. Think about how you identified bottlenecks in GPU applications and the strategies you used to resolve them. Real-world examples will make your answers stand out!

✨Collaborate Like a Pro

Since this role involves working closely with hardware and software teams, be ready to talk about your collaborative experiences. Highlight any projects where you integrated different systems or worked on system-level architecture decisions, especially in large-scale environments.

✨Understand the Bigger Picture

Familiarise yourself with the latest trends in AI/ML and distributed deep learning frameworks. Be prepared to discuss how you would approach optimising large-scale workloads in production environments, including your experience with Kubernetes, Docker, and CI/CD automation.

ML Systems/Infrastructure Engineer
Oriole

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

O
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>