ML Systems/Infrastructure Engineer in London
ML Systems/Infrastructure Engineer

ML Systems/Infrastructure Engineer in London

London Full-Time 36000 - 60000 £ / year (est.) No home office possible
Go Premium
O

At a Glance

  • Tasks: Design and optimise GPU communication kernels for AI/ML workloads in a dynamic team.
  • Company: Join Oriole, a cutting-edge tech company revolutionising AI/ML infrastructure.
  • Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
  • Why this job: Make a real impact on the future of AI/ML technology with innovative projects.
  • Qualifications: Proficient in C++ and Python, with experience in GPU programming and high-performance computing.
  • Other info: Collaborative environment with a focus on advanced network infrastructure and career advancement.

The predicted salary is between 36000 - 60000 £ per year.

Oriole is seeking talented ML Systems/Infrastructure Engineer to help co-optimize our AI/ML software stack with cutting-edge network hardware. You’ll be a key contributor to a high-impact, agile team focused on integrating middleware communication libraries and modelling the performance of large-scale AI/ML workloads.

Key Responsibilities

  • Design and optimize custom GPU communication kernels to enhance performance and scalability across multi-node environments.
  • Develop and maintain distributed communication frameworks for large-scale deep learning models, ensuring efficient parallelization and optimal resource utilization.
  • Profile, benchmark, and debug GPU applications to identify and resolve bottlenecks in communication and computation pipelines.
  • Collaborate closely with hardware and software teams to integrate optimized kernels with Oriole’s next-generation network hardware and software stack.
  • Contribute to system-level architecture decisions for large-scale GPU clusters, with a focus on communication efficiency, fault tolerance, and novel architectures for advanced optical network infrastructure.

Required Skills & Experience

  • Proficient in C++ and Python, with a strong track record in high-performance computing or machine learning projects.
  • Expertise in GPU programming with CUDA, including deep knowledge of GPU memory hierarchies and kernel optimization.
  • Hands-on experience debugging GPU kernels using tools such as Cuda-gdb, Cuda Memcheck, NSight Systems, PTX, and SASS.
  • Strong understanding of communication libraries and protocols, including NCCL, NVSHMEM, OpenMPI, UCX, or custom collective communication implementations.
  • Familiarity with HPC networking protocols/libraries such as RoCE, Infiniband, Libibverbs, and libfabric.
  • Experience with distributed deep learning/MoE frameworks, including PyTorch Distributed, vLLM, or DeepEP.
  • Solid understanding of deploying and optimizing large-scale distributed deep learning workloads in production environments, including Linux, Kubernetes, SLURM, OpenMPI, GPU drivers, Docker, and CI/CD automation.

ML Systems/Infrastructure Engineer in London employer: Oriole

At Oriole, we pride ourselves on being an exceptional employer that fosters innovation and collaboration in the rapidly evolving field of AI and machine learning. Our dynamic work culture encourages continuous learning and professional growth, offering employees the chance to work with cutting-edge technology in a supportive environment. Located in a vibrant tech hub, we provide unique opportunities for impactful contributions while ensuring a healthy work-life balance and competitive benefits.
O

Contact Detail:

Oriole Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land ML Systems/Infrastructure Engineer in London

✨Tip Number 1

Network, network, network! Get out there and connect with folks in the industry. Attend meetups, webinars, or even online forums related to ML and infrastructure. You never know who might have a lead on your dream job!

✨Tip Number 2

Show off your skills! Create a portfolio showcasing your projects, especially those involving GPU programming and distributed systems. This will give potential employers a taste of what you can do and set you apart from the crowd.

✨Tip Number 3

Don’t just apply blindly! Tailor your approach for each application. Research the company, understand their tech stack, and mention how your experience with CUDA or communication libraries can benefit them directly.

✨Tip Number 4

Apply through our website! We love seeing candidates who take the initiative. Plus, it gives you a better chance to stand out in our system. So, don’t hesitate – hit that apply button and show us what you've got!

We think you need these skills to ace ML Systems/Infrastructure Engineer in London

C++
Python
High-Performance Computing
Machine Learning
GPU Programming
CUDA
GPU Memory Hierarchies
Kernel Optimization
Debugging GPU Kernels
Cuda-gdb
Cuda Memcheck
NSight Systems
NCCL
OpenMPI
Distributed Deep Learning
Linux
Kubernetes
SLURM
Docker
CI/CD Automation

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the ML Systems/Infrastructure Engineer role. Highlight your experience with C++, Python, and GPU programming, as well as any relevant projects that showcase your skills in high-performance computing.

Showcase Your Projects: Include specific examples of your work with distributed communication frameworks and GPU applications. We want to see how you've tackled challenges in previous roles, especially those related to deep learning and performance optimisation.

Be Clear and Concise: When writing your cover letter, keep it clear and to the point. Explain why you're excited about the role at Oriole and how your skills align with the job description. We appreciate straightforward communication!

Apply Through Our Website: Don’t forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on joining our team!

How to prepare for a job interview at Oriole

✨Know Your Tech Inside Out

Make sure you’re well-versed in C++ and Python, as well as GPU programming with CUDA. Brush up on your knowledge of communication libraries like NCCL and OpenMPI, and be ready to discuss how you've used these in past projects.

✨Showcase Your Problem-Solving Skills

Prepare to talk about specific challenges you've faced in high-performance computing or machine learning. Think of examples where you debugged GPU kernels or optimised communication frameworks, and be ready to explain your thought process.

✨Collaborate Like a Pro

Since collaboration is key, think of instances where you worked closely with hardware and software teams. Be prepared to discuss how you contributed to system-level architecture decisions and how you ensured efficient resource utilisation.

✨Demonstrate Your Passion for Innovation

Oriole is looking for someone who can contribute to cutting-edge solutions. Share your thoughts on novel architectures for advanced optical network infrastructure and any personal projects that showcase your enthusiasm for the field.

ML Systems/Infrastructure Engineer in London
Oriole
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

O
  • ML Systems/Infrastructure Engineer in London

    London
    Full-Time
    36000 - 60000 £ / year (est.)
  • O

    Oriole

    50-100
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>