Senior System Engineer (Munich, Germany)
Senior System Engineer (Munich, Germany)

Senior System Engineer (Munich, Germany)

Full-Time 80000 - 100000 £ / year (est.) No home office possible
RemoteStar

At a Glance

  • Tasks: Design and develop software for AI infrastructure, optimising high-scale compute and Kubernetes extensions.
  • Company: Fast-growing deep-tech company leading in Quantum Software and AI innovation.
  • Benefits: Competitive salary, flexible work environment, and opportunities for professional growth.
  • Other info: Dynamic role with significant impact on cutting-edge AI projects.
  • Why this job: Join a multicultural team and shape the future of AI technology.
  • Qualifications: 10+ years in systems programming with expertise in Python, Kubernetes, and GPU ecosystems.

The predicted salary is between 80000 - 100000 £ per year.

About client: Well-funded and fast-growing deep-tech company founded in 2019. We are the biggest Quantum Software company in the EU. They are also one of the 100 most promising companies in AI in the world (according to CB Insights, 2023) with 150+ employees and growing, fully multicultural and international.

Requirements

  • Systems Programming Expertise: 10+ years of software engineering experience with strong proficiency in Python. You must be comfortable building system agents, APIs, and CLI tools.
  • Deep Kubernetes Knowledge: You understand K8s internals beyond simple deployment. Experience with Custom Resource Definitions (CRDs), Operators, and the Kubernetes API server architecture.
  • GPU Ecosystem Experience: Hands-on experience managing NVIDIA GPU clusters. Familiarity with NVIDIA drivers, CUDA toolkit, and the container runtime (NVIDIA Container Toolkit).
  • Linux Internals: Deep understanding of the Linux kernel, cgroups, namespaces, and system performance tuning.
  • Infrastructure as Code: Mastery of declarative infrastructure tools (Terraform, Ansible) but with a focus on provisioning physical hardware rather than just cloud VMs.
  • Problem Solving: A proven track record of debugging complex distributed systems where the root cause could be code, network, or silicon.

Preferred qualifications

  • HPC Background: Experience working with traditional supercomputing schedulers (Slurm, PBS) or modern batch schedulers (Volcano, Kueue, Ray).
  • Bare Metal Provisioning: Experience with tools like Cluster API (CAPI), Metal3, Tinkerbell, Canonical MaaS, or OpenStack Ironic.
  • High-Speed Networking: Knowledge of RDMA, InfiniBand, GPUDirect, and how to expose these technologies to containerized workloads.
  • AI/ML Familiarity: Understanding of how distributed training works (e.g., PyTorch Distributed, Megatron-LM, DeepSpeed) and the infrastructure requirements of Large Language Models (LLMs).
  • Observability: Experience building monitoring for hardware health (DCGM) and distributed tracing for long-running jobs.

Location: Applicants must have legal authorization to work in the country where the position is based.

What you will be doing

  • Building the Control Plane: Designing and developing the software layer (APIs, Controllers, Agents) that automates the lifecycle of bare-metal AI infrastructure.
  • Orchestrating High-Scale Compute: Architecting scheduling solutions for large-scale distributed training jobs across massive clusters of GPUs (NVIDIA H200/B200/B300), ensuring efficient bin-packing and gang scheduling.
  • Optimizing the Fabric: Tuning the software-defined networking layer to support low-latency interconnects (InfiniBand/RDMA/RoCEv2) essential for multi-node training.
  • Developing Kubernetes Extensions: Writing custom Kubernetes Operators and CRDs to abstract complex hardware realities (topology awareness, GPU partitioning) into usable interfaces for our Data Scientists.
  • Hardware-Level Debugging: Investigating and resolving deep systems issues, ranging from PCIe bus errors and NCCL communication timeouts to kernel panics on bare-metal nodes.
  • Defining Standards: Creating the 'Golden Image' for AI workloads, managing drivers, firmware, and OS optimizations to squeeze maximum performance out of the hardware.

Perks

Senior System Engineer (Munich, Germany) employer: RemoteStar

Join a pioneering deep-tech company in Munich, recognised as the largest Quantum Software firm in the EU and one of the top 100 AI companies globally. With a vibrant multicultural environment and a commitment to employee growth, we offer unparalleled opportunities for professional development, competitive benefits, and the chance to work on cutting-edge technology that shapes the future of AI infrastructure.
RemoteStar

Contact Detail:

RemoteStar Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Senior System Engineer (Munich, Germany)

✨Tip Number 1

Network like a pro! Attend industry meetups, conferences, or webinars related to quantum software and AI. Engaging with professionals in the field can lead to valuable connections and potential job opportunities.

✨Tip Number 2

Show off your skills! Create a portfolio showcasing your projects, especially those involving Kubernetes, Python, or GPU management. This gives you a chance to demonstrate your expertise beyond just a CV.

✨Tip Number 3

Prepare for technical interviews by brushing up on your problem-solving skills. Practice debugging complex systems and be ready to discuss your experience with distributed training and infrastructure as code.

✨Tip Number 4

Apply through our website! We make it easy for you to submit your application directly, ensuring it gets into the right hands. Plus, it shows you're genuinely interested in joining our team!

We think you need these skills to ace Senior System Engineer (Munich, Germany)

Python
Kubernetes
Custom Resource Definitions (CRDs)
Operators
NVIDIA GPU management
CUDA toolkit
Linux kernel
cgroups
namespaces
Terraform
Ansible
Slurm
PBS
RDMA
InfiniBand
PyTorch Distributed

Some tips for your application 🫡

Show Off Your Skills: Make sure to highlight your 10+ years of software engineering experience, especially with Python. We want to see how you've built system agents, APIs, and CLI tools in your previous roles.

Kubernetes Know-How: Don’t forget to mention your deep understanding of Kubernetes internals. We’re looking for someone who knows their way around CRDs, Operators, and the Kubernetes API server architecture like the back of their hand.

Experience Matters: If you’ve got hands-on experience managing NVIDIA GPU clusters or working with Linux internals, make it known! We love seeing candidates who can tackle complex distributed systems and have a knack for problem-solving.

Apply Through Our Website: We encourage you to apply through our website. It’s the best way for us to get your application and ensure it reaches the right people. Plus, we can’t wait to see what you bring to the table!

How to prepare for a job interview at RemoteStar

✨Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description. Brush up on your Python skills, Kubernetes internals, and GPU management. Be ready to discuss specific projects where you've applied these skills.

✨Showcase Problem-Solving Skills

Prepare examples of complex problems you've solved in distributed systems. Highlight your debugging process and how you identified root causes, whether they were code-related or network issues. This will demonstrate your analytical thinking.

✨Familiarise with Infrastructure as Code

Since the role requires mastery of tools like Terraform and Ansible, be prepared to discuss your experience with these. Share specific instances where you’ve provisioned physical hardware using these tools, as this is crucial for the position.

✨Understand the Company’s Vision

Research the company’s mission and recent achievements in the quantum software and AI space. Being able to articulate how your skills align with their goals will show your genuine interest and help you stand out in the interview.

Senior System Engineer (Munich, Germany)
RemoteStar

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>