ML Systems/Infrastructure Engineer in London
ML Systems/Infrastructure Engineer

ML Systems/Infrastructure Engineer in London

London Full-Time 48000 - 72000 Β£ / year (est.) Home office (partial)
Go Premium
E

At a Glance

  • Tasks: Design and optimise GPU communication kernels for AI/ML workloads in a dynamic team.
  • Company: Join Oriole Networks, a leader in photonic networking and AI technology.
  • Benefits: Enjoy a hybrid work model, competitive salary, and opportunities for professional growth.
  • Why this job: Make a real impact on cutting-edge AI technologies and revolutionise data centres.
  • Qualifications: Proficient in C++ and Python with experience in GPU programming and high-performance computing.
  • Other info: Collaborative environment focused on innovation and sustainability in tech.

The predicted salary is between 48000 - 72000 Β£ per year.

Oriole is seeking a talented ML Systems/Infrastructure Engineer to help co-optimize our AI/ML software stack with cutting-edge network hardware. You will be a key contributor to a high-impact, agile team focused on integrating middleware communication libraries and modelling the performance of large-scale AI/ML workloads.

Key Responsibilities

  • Design and optimize custom GPU communication kernels to enhance performance and scalability across multi-node environments.
  • Develop and maintain distributed communication frameworks for large-scale deep learning models, ensuring efficient parallelization and optimal resource utilization.
  • Profile, benchmark, and debug GPU applications to identify and resolve bottlenecks in communication and computation pipelines.
  • Collaborate closely with hardware and software teams to integrate optimized kernels with Oriole's next-generation network hardware and software stack.
  • Contribute to system-level architecture decisions for large-scale GPU clusters, with a focus on communication efficiency, fault tolerance, and novel architectures for advanced optical network infrastructure.

Required Skills & Experience

  • Proficient in C++ and Python, with a strong track record in high-performance computing or machine learning projects.
  • Expertise in GPU programming with CUDA, including deep knowledge of GPU memory hierarchies and kernel optimization.
  • Hands-on experience debugging GPU kernels using tools such as Cuda-gdb, Cuda Memcheck, NSight Systems, PTX, and SASS.
  • Strong understanding of communication libraries and protocols, including NCCL, NVSHMEM, OpenMPI, UCX, or custom collective communication implementations.
  • Familiarity with HPC networking protocols/libraries such as RoCE, Infiniband, Libibverbs, and libfabric.
  • Experience with distributed deep learning/MoE frameworks, including PyTorch Distributed, vLLM, or DeepEP.
  • Solid understanding of deploying and optimizing large-scale distributed deep learning workloads in production environments, including Linux, Kubernetes, SLURM, OpenMPI, GPU drivers, Docker, and CI/CD automation.

ML Systems/Infrastructure Engineer in London employer: EPIC Centre

Oriole Networks is an exceptional employer, offering a dynamic work environment in the heart of London where innovation meets collaboration. With a strong focus on employee growth, we provide opportunities to work on cutting-edge AI/ML technologies while fostering a culture of inclusivity and support. Our hybrid work model ensures flexibility, allowing you to balance your professional and personal life effectively.
E

Contact Detail:

EPIC Centre Recruiting Team

StudySmarter Expert Advice 🀫

We think this is how you could land ML Systems/Infrastructure Engineer in London

✨Tip Number 1

Network, network, network! Get out there and connect with folks in the industry. Attend meetups, webinars, or even just grab a coffee with someone who works in ML or infrastructure. You never know who might have the inside scoop on job openings!

✨Tip Number 2

Show off your skills! Create a portfolio showcasing your projects, especially those involving GPU programming or distributed deep learning. This will give potential employers a taste of what you can do and set you apart from the crowd.

✨Tip Number 3

Don’t be shy about reaching out directly to companies like Oriole. A quick email or LinkedIn message expressing your interest can go a long way. Plus, applying through our website shows you're serious about joining the team!

✨Tip Number 4

Prepare for interviews by brushing up on your technical skills and understanding of communication libraries and protocols. Practice explaining your past projects and how they relate to the role. Confidence is key, so get ready to shine!

We think you need these skills to ace ML Systems/Infrastructure Engineer in London

C++
Python
High-Performance Computing
Machine Learning
GPU Programming
CUDA
GPU Memory Hierarchies
Kernel Optimization
Debugging GPU Kernels
NCCL
NVSHMEM
OpenMPI
UCX
Distributed Deep Learning
Linux
Kubernetes
SLURM
Docker
CI/CD Automation

Some tips for your application 🫑

Tailor Your CV: Make sure your CV is tailored to the ML Systems/Infrastructure Engineer role. Highlight your experience with C++, Python, and GPU programming, as well as any relevant projects that showcase your skills in high-performance computing.

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about AI/ML and how your background makes you a perfect fit for our team. Don’t forget to mention specific experiences that relate to the job description.

Showcase Your Projects: If you've worked on any projects involving distributed deep learning or GPU optimization, make sure to include them in your application. We love seeing real-world applications of your skills, so don’t hold back!

Apply Through Our Website: We encourage you to apply through our website for the best chance of getting noticed. It’s super easy, and you’ll be able to keep track of your application status directly with us!

How to prepare for a job interview at EPIC Centre

✨Know Your Tech Inside Out

Make sure you’re well-versed in C++ and Python, as well as GPU programming with CUDA. Brush up on your knowledge of communication libraries like NCCL and OpenMPI, and be ready to discuss your hands-on experience with debugging tools. This will show that you’re not just familiar with the tech but can also apply it effectively.

✨Showcase Your Problem-Solving Skills

Prepare to discuss specific challenges you've faced in previous projects, especially those related to performance bottlenecks in GPU applications. Use examples that highlight your ability to profile, benchmark, and debug effectively. This will demonstrate your analytical skills and how you approach complex problems.

✨Collaborate Like a Pro

Since this role involves working closely with hardware and software teams, be ready to talk about your collaborative experiences. Share examples of how you’ve integrated different systems or worked on cross-functional teams. This will illustrate your teamwork skills and your ability to contribute to system-level architecture decisions.

✨Understand the Bigger Picture

Familiarise yourself with Oriole's mission and how your role as an ML Systems/Infrastructure Engineer fits into their vision of revolutionising data centres. Being able to articulate how your work contributes to accelerating AI in a low carbon world will set you apart and show your genuine interest in the company.

ML Systems/Infrastructure Engineer in London
EPIC Centre
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

E
  • ML Systems/Infrastructure Engineer in London

    London
    Full-Time
    48000 - 72000 Β£ / year (est.)
  • E

    EPIC Centre

    50-100
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>