Senior ML Systems Engineer - Simulations London Office ·

Senior ML Systems Engineer - Simulations London Office ·

Full-Time 80000 - 100000 £ / year (est.) No working from home possible
Oriole Networks Ltd

At a Glance

  • Tasks: Create and validate simulation models for large-scale ML systems to optimise performance.
  • Company: Join a leading tech firm at the forefront of machine learning innovation.
  • Benefits: Enjoy competitive pay, flexible working options, and opportunities for professional growth.
  • Other info: Collaborative environment with a focus on cutting-edge technology and career advancement.
  • Why this job: Make a real impact in the exciting world of machine learning and simulations.
  • Qualifications: Master’s or PhD in relevant fields with strong ML systems experience.

The predicted salary is between 80000 - 100000 £ per year.

We are looking for a Senior ML Systems Engineer to build and validate simulation infrastructure for large-scale machine learning systems. This role focuses on modelling the compute and communication behaviour of systems used for ML training and inference, and using simulation to guide architecture, performance optimization, and capacity planning. The ideal candidate combines strong systems experience with hands-on experience in measurement, benchmarking, and performance analysis of modern ML systems.

What You’ll Do:

  • Build simulation models for compute, memory, interconnect, and communication behaviour in ML systems.
  • Develop tools to simulate performance for training and inference workloads.
  • Model distributed execution across accelerators, hosts, and network fabrics, including collectives, synchronization, and communication bottlenecks.
  • Use simulation and analytical modelling to evaluate tradeoffs, identify bottlenecks, and guide system design.
  • Run performance experiments and benchmarks on real ML systems to calibrate and validate simulation models.
  • Analyze end-to-end performance, including throughput, latency, scaling efficiency, utilisation, and cost/performance tradeoffs.
  • Partner with hardware/software/Networking/ML teams to align simulation with real workloads and constraints.
  • Create reproducible benchmarking methodologies across models, system configurations, and compare against real system measurements to prove validity.
  • Communicate findings through technical reports and design recommendations.

Qualifications Required:

  • Master’s, or PhD in Computer Science, Electrical Engineering, Computer Engineering, or a related field.
  • Strong experience in ML systems, distributed systems, performance engineering, computer architecture, or simulation.
  • Understanding of systems used for machine learning training and inference.
  • Experience analyzing compute, communication, and memory behaviour in large-scale ML systems.
  • Hands-on experience with performance benchmarking, profiling, and measurement of ML systems.
  • Experience with distributed training concepts such as data parallelism, tensor/model parallelism, pipeline parallelism, collectives, and synchronization overheads.
  • Proficiency in one of the following: Python, C++, or Rust.
  • Strong analytical skills and the ability to connect simulation results to real system behaviour.

Preferred:

  • Experience with system performance modelling, network simulation, or architecture evaluation tools.
  • Familiarity with accelerator-based systems such as GPUs, TPUs, or custom ML hardware.
  • Experience with PyTorch, JAX, TensorFlow, NCCL, XLA, CUDA, or similar tools.
  • Knowledge of interconnect and networking technologies such as InfiniBand, Ethernet/RDMA, NVLink, PCIe, or equivalent.
  • Experience evaluating both training throughput and inference latency/serving efficiency.
  • Background in workload characterization, trace-driven simulation, or model calibration.
  • Ability to work across hardware and software boundaries in a cross-functional environment.

What Success Looks Like:

  • Build simulation models that accurately predict performance trends and inform architectural decisions.
  • Identify compute and communication bottlenecks in ML training and inference systems.
  • Correlate simulation outputs with real-world benchmark data.
  • Improve system efficiency, scalability, and cost effectiveness through data-driven insights.

Senior ML Systems Engineer - Simulations London Office · employer: Oriole Networks Ltd

Join a forward-thinking company in London that prioritises innovation and collaboration, making it an exceptional employer for a Senior ML Systems Engineer. With a strong focus on employee growth, you will have access to cutting-edge technology and the opportunity to work alongside industry experts, fostering a culture of continuous learning and development. Enjoy a supportive work environment that values your contributions and encourages meaningful engagement in large-scale machine learning projects.

Oriole Networks Ltd

Contact Details:

Oriole Networks Ltd Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Senior ML Systems Engineer - Simulations London Office ·

Tip Number 1

Network, network, network! Get out there and connect with folks in the ML community. Attend meetups, conferences, or even online webinars. You never know who might have a lead on that perfect Senior ML Systems Engineer role!

Tip Number 2

Show off your skills! Create a portfolio showcasing your simulation models and performance analysis projects. This is your chance to demonstrate your hands-on experience and analytical skills, which are key for this role.

Tip Number 3

Don’t just apply anywhere; focus on companies that align with your interests in ML systems. Use our website to find roles that excite you and tailor your approach to each one. We’re here to help you land that dream job!

Tip Number 4

Prepare for interviews by brushing up on your knowledge of distributed systems and performance engineering. Be ready to discuss your past experiences and how they relate to the challenges faced in ML training and inference. Confidence is key!

We think you need these skills to ace Senior ML Systems Engineer - Simulations London Office ·

Simulation Modelling
Performance Benchmarking
Distributed Systems
Machine Learning Systems
Analytical Modelling
Performance Analysis
Python

Some tips for your application 🫡

Tailor Your CV:Make sure your CV is tailored to the Senior ML Systems Engineer role. Highlight your experience with simulation models, performance benchmarking, and any relevant projects that showcase your skills in ML systems.

Craft a Compelling Cover Letter:Your cover letter should tell us why you're the perfect fit for this role. Share specific examples of your work with distributed systems and how you've tackled performance challenges in the past.

Showcase Your Technical Skills:Don’t forget to mention your proficiency in Python, C++, or Rust. We want to see how your technical skills align with our needs, so include any relevant tools or frameworks you’ve worked with, like PyTorch or TensorFlow.

Apply Through Our Website:We encourage you to apply through our website for a smoother application process. It helps us keep track of your application and ensures you don’t miss out on any important updates!

How to prepare for a job interview at Oriole Networks Ltd

Know Your Stuff

Make sure you brush up on your knowledge of machine learning systems and simulation models. Be ready to discuss your hands-on experience with performance benchmarking and how it relates to the role. Familiarity with tools like PyTorch or TensorFlow will definitely give you an edge.

Showcase Your Analytical Skills

Prepare to demonstrate your analytical skills by discussing past projects where you identified bottlenecks or improved system efficiency. Use specific examples that highlight your ability to connect simulation results to real-world performance.

Collaborate Like a Pro

This role involves working with various teams, so be ready to talk about your experience in cross-functional environments. Share examples of how you've partnered with hardware, software, or networking teams to align simulations with real workloads.

Communicate Clearly

Since you'll need to communicate findings through technical reports, practice explaining complex concepts in simple terms. Think about how you would present your simulation models and performance analysis to someone who might not have a technical background.