Senior ML Systems Engineer - Simulations

Senior ML Systems Engineer - Simulations

Full-Time 70000 - 90000 € / year (est.) No home office possible
O

At a Glance

  • Tasks: Build and validate simulation infrastructure for large-scale machine learning systems.
  • Company: Join Oriole Networks, a leader in photonic networking and AI/ML technologies.
  • Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
  • Other info: Collaborative environment focused on innovation and sustainability.
  • Why this job: Make a real impact on AI advancements while working with cutting-edge technology.
  • Qualifications: Master’s or PhD in relevant fields and strong experience in ML systems.

The predicted salary is between 70000 - 90000 € per year.

We are looking for a Senior ML Systems Engineer to build and validate simulation infrastructure for large-scale machine learning systems. This role focuses on modelling the compute and communication behaviour of systems used for ML training and inference, and using simulation to guide architecture, performance optimization, and capacity planning.

What You’ll Do

  • Build simulation models for compute, memory, interconnect, and communication behaviour in ML systems.
  • Develop tools to simulate performance for training and inference workloads.
  • Model distributed execution across accelerators, hosts, and network fabrics, including collectives, synchronization, and communication bottlenecks.
  • Use simulation and analytical modelling to evaluate tradeoffs, identify bottlenecks, and guide system design.
  • Run performance experiments and benchmarks on real ML systems to calibrate and validate simulation models.
  • Analyze end-to-end performance, including throughput, latency, scaling efficiency, utilization, and cost/performance tradeoffs.
  • Partner with hardware/software/Networking/ML teams to align simulation with real workloads and constraints.
  • Create reproducible benchmarking methodologies across models, system configurations, and compare against real system measurements to prove validity.
  • Communicate findings through technical reports and design recommendations.

Qualifications Required

  • Master’s, or PhD in Computer Science, Electrical Engineering, Computer Engineering, or a related field.
  • Strong experience in ML systems, distributed systems, performance engineering, computer architecture, or simulation.
  • Understanding of systems used for machine learning training and inference.
  • Experience analyzing compute, communication, and memory behavior in large-scale ML systems.
  • Hands‑on experience with performance benchmarking, profiling, and measurement of ML systems.
  • Experience with distributed training concepts such as data parallelism, tensor/model parallelism, pipeline parallelism, collectives, and synchronization overheads.
  • Proficiency in one of the following: Python, C++, or Rust.
  • Strong analytical skills and the ability to connect simulation results to real system behavior.

Preferred

  • Experience with system performance modelling, network simulation, or architecture evaluation tools.
  • Familiarity with accelerator‑based systems such as GPUs, TPUs, or custom ML hardware.
  • Experience with PyTorch, JAX, TensorFlow, NCCL, XLA, CUDA, or similar tools.
  • Knowledge of interconnect and networking technologies such as InfiniBand, Ethernet/RDMA, NVLink, PCIe, or equivalent.
  • Experience evaluating both training throughput and inference latency/serving efficiency.
  • Background in workload characterization, trace‑driven simulation, or model calibration.
  • Ability to work across hardware and software boundaries in a cross‑functional environment.

What Success Looks Like

  • Build simulation models that accurately predict performance trends and inform architectural decisions.
  • Identify compute and communication bottlenecks in ML training and inference systems.
  • Correlate simulation outputs with real‑world benchmark data.
  • Improve system efficiency, scalability, and cost effectiveness through data‑driven insights.

Accelerating AI in a Low Carbon World – Oriole Networks is a photonic networking company, developing disruptive technologies for AI/ML and HPC networking that will revolutionise data centres.

Senior ML Systems Engineer - Simulations employer: Oriole

At Oriole Networks, we pride ourselves on being an exceptional employer, offering a dynamic work culture that fosters innovation and collaboration in the cutting-edge field of AI and machine learning. Our employees benefit from continuous growth opportunities, access to state-of-the-art technology, and a commitment to sustainability, all while working in a supportive environment that values their contributions and encourages professional development. Join us in revolutionising data centres and making a meaningful impact in a low carbon world.

O

Contact Detail:

Oriole Recruiting Team

StudySmarter Expert Advice🤫

We think this is how you could land Senior ML Systems Engineer - Simulations

Tip Number 1

Network, network, network! Get out there and connect with people in the industry. Attend meetups, webinars, or conferences related to ML systems engineering. You never know who might have a lead on your dream job!

Tip Number 2

Show off your skills! Create a portfolio showcasing your simulation models and performance benchmarks. This will give potential employers a taste of what you can do and set you apart from the competition.

Tip Number 3

Prepare for interviews by brushing up on your technical knowledge. Be ready to discuss your experience with distributed systems and performance engineering. Practise explaining complex concepts in simple terms – it shows you really understand your stuff!

Tip Number 4

Don’t forget to apply through our website! We love seeing candidates who are genuinely interested in joining us at StudySmarter. Tailor your application to highlight how your skills align with the role and our mission.

We think you need these skills to ace Senior ML Systems Engineer - Simulations

Simulation Modelling
Performance Engineering
Distributed Systems
Machine Learning Systems
Benchmarking Methodologies
Analytical Modelling
Python

Some tips for your application 🫡

Tailor Your CV:Make sure your CV is tailored to the Senior ML Systems Engineer role. Highlight your experience with ML systems, performance engineering, and any relevant tools you've used. We want to see how your skills align with what we're looking for!

Showcase Your Projects:Include specific projects or experiences that demonstrate your ability to build simulation models and analyse performance. We love seeing real-world applications of your skills, so don’t hold back on the details!

Be Clear and Concise:When writing your application, keep it clear and to the point. Use bullet points where possible to make it easy for us to read through your qualifications and experiences. We appreciate a well-structured application!

Apply Through Our Website:Don’t forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy to do!

How to prepare for a job interview at Oriole

Know Your Simulation Models

Make sure you understand the intricacies of simulation models for compute, memory, and communication behaviour in ML systems. Brush up on how these models can guide architecture and performance optimisation, as this will likely come up during your interview.

Showcase Your Analytical Skills

Be prepared to discuss your experience with performance benchmarking and profiling. Highlight specific examples where you've identified bottlenecks or improved system efficiency using data-driven insights. This will demonstrate your strong analytical skills and relevance to the role.

Familiarise Yourself with Relevant Tools

Get comfortable with tools like PyTorch, TensorFlow, and CUDA, as well as networking technologies such as InfiniBand and PCIe. Being able to speak confidently about these tools will show that you're not just theoretically knowledgeable but also practically skilled.

Prepare for Cross-Functional Collaboration

Since the role involves partnering with various teams, think of examples from your past experiences where you've successfully collaborated across hardware and software boundaries. This will highlight your ability to work in a cross-functional environment, which is crucial for this position.