Senior System Engineer (Munich, Germany) in Cambourne
Senior System Engineer (Munich, Germany)

Senior System Engineer (Munich, Germany) in Cambourne

Cambourne Full-Time 80000 - 100000 £ / year (est.) No home office possible
RemoteStar

At a Glance

  • Tasks: Design and develop software for AI infrastructure, optimising high-scale compute and Kubernetes extensions.
  • Company: Fast-growing deep-tech company leading in Quantum Software and AI in the EU.
  • Benefits: Indefinite contract, equal pay, bonuses, private health insurance, and flexible working hours.
  • Other info: Dynamic culture focused on career growth and learning opportunities.
  • Why this job: Join a progressive company and work with cutting-edge technologies in a multicultural environment.
  • Qualifications: 10+ years in software engineering, strong Python skills, and deep Kubernetes knowledge.

The predicted salary is between 80000 - 100000 £ per year.

About client: Well-funded and fast-growing deep-tech company founded in 2019. We are the biggest Quantum Software company in the EU. They are also one of the 100 most promising companies in AI in the world (according to CB Insights, 2023) with 150+ employees and growing, fully multicultural and international.

Requirements

  • Systems Programming Expertise: 10+ years of software engineering experience with strong proficiency in Python. You must be comfortable building system agents, APIs, and CLI tools.
  • Deep Kubernetes Knowledge: You understand K8s internals beyond simple deployment. Experience with Custom Resource Definitions (CRDs), Operators, and the Kubernetes API server architecture.
  • GPU Ecosystem Experience: Hands-on experience managing NVIDIA GPU clusters. Familiarity with NVIDIA drivers, CUDA toolkit, and the container runtime (NVIDIA Container Toolkit).
  • Linux Internals: Deep understanding of the Linux kernel, cgroups, namespaces, and system performance tuning.
  • Infrastructure as Code: Mastery of declarative infrastructure tools (Terraform, Ansible) but with a focus on provisioning physical hardware rather than just cloud VMs.
  • Problem Solving: A proven track record of debugging complex distributed systems where the root cause could be code, network, or silicon.

Preferred qualifications

  • HPC Background: Experience working with traditional supercomputing schedulers (Slurm, PBS) or modern batch schedulers (Volcano, Kueue, Ray).
  • Bare Metal Provisioning: Experience with tools like Cluster API (CAPI), Metal3, Tinkerbell, Canonical MaaS, or OpenStack Ironic.
  • High-Speed Networking: Knowledge of RDMA, InfiniBand, GPUDirect, and how to expose these technologies to containerized workloads.
  • AI/ML Familiarity: Understanding of how distributed training works (e.g., PyTorch Distributed, Megatron-LM, DeepSpeed) and the infrastructure requirements of Large Language Models (LLMs).
  • Observability: Experience building monitoring for hardware health (DCGM) and distributed tracing for long-running jobs.

Location: Applicants must have legal authorization to work in the country where the position is based.

What you will be doing

  • Building the Control Plane: Designing and developing the software layer (APIs, Controllers, Agents) that automates the lifecycle of bare-metal AI infrastructure.
  • Orchestrating High-Scale Compute: Architecting scheduling solutions for large-scale distributed training jobs across massive clusters of GPUs (NVIDIA H200/B200/B300), ensuring efficient bin-packing and gang scheduling.
  • Optimizing the Fabric: Tuning the software-defined networking layer to support low-latency interconnects (InfiniBand/RDMA/RoCEv2) essential for multi-node training.
  • Developing Kubernetes Extensions: Writing custom Kubernetes Operators and CRDs to abstract complex hardware realities (topology awareness, GPU partitioning) into usable interfaces for our Data Scientists.
  • Hardware-Level Debugging: Investigating and resolving deep systems issues, ranging from PCIe bus errors and NCCL communication timeouts to kernel panics on bare-metal nodes.
  • Defining Standards: Creating the "Golden Image" for AI workloads, managing drivers, firmware, and OS optimizations to squeeze maximum performance out of the hardware.

Perks & Benefits

  • Indefinite contract.
  • Equal pay guaranteed.
  • Variable performance bonus.
  • Signing bonus.
  • Relocation package (if applicable).
  • Private health insurance.
  • Eligibility for educational budget according to internal policy.
  • Hybrid opportunity.
  • Flexible working hours.
  • Working in a high-paced environment, working on cutting-edge technologies.
  • Career plan.
  • Opportunity to learn and teach.
  • Progressive Company.
  • Happy people culture.

Senior System Engineer (Munich, Germany) in Cambourne employer: RemoteStar

Join a pioneering deep-tech company in Munich, recognised as the largest Quantum Software firm in the EU and one of the top 100 AI companies globally. With a commitment to equal pay, flexible working hours, and a culture that prioritises employee happiness and growth, this is an exceptional opportunity for Senior System Engineers to work on cutting-edge technologies in a multicultural environment. Enjoy benefits like an indefinite contract, performance bonuses, and a supportive career development plan while contributing to innovative solutions in AI infrastructure.
RemoteStar

Contact Detail:

RemoteStar Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Senior System Engineer (Munich, Germany) in Cambourne

✨Tip Number 1

Network like a pro! Reach out to current employees at the company through LinkedIn or industry events. A friendly chat can give us insights into the company culture and maybe even a referral!

✨Tip Number 2

Show off your skills in action! Consider creating a GitHub repository showcasing your projects related to systems programming, Kubernetes, or AI/ML. This gives us tangible proof of your expertise beyond just words.

✨Tip Number 3

Prepare for technical interviews by brushing up on your problem-solving skills. Practice debugging complex systems and be ready to discuss your past experiences with distributed systems and infrastructure as code.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets noticed. Plus, we love seeing candidates who take that extra step to connect directly with us.

We think you need these skills to ace Senior System Engineer (Munich, Germany) in Cambourne

Python
Systems Programming
Kubernetes
Custom Resource Definitions (CRDs)
Operators
NVIDIA GPU Management
CUDA Toolkit
Linux Kernel Internals
Infrastructure as Code (Terraform, Ansible)
Debugging Complex Distributed Systems
High-Performance Computing (HPC)
Bare Metal Provisioning
High-Speed Networking (RDMA, InfiniBand)
AI/ML Infrastructure Understanding
Observability and Monitoring

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights your experience with systems programming and Kubernetes. We want to see how your skills match the job description, so don’t be shy about showcasing your Python expertise and any relevant projects you've worked on.

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about the role and how your background makes you a perfect fit for our team. We love hearing personal stories that connect your experience to what we do at StudySmarter.

Showcase Problem-Solving Skills: In your application, highlight specific examples where you've debugged complex systems or tackled challenging problems. We’re all about finding solutions, so let us know how you’ve made an impact in your previous roles!

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows us you’re keen on joining our awesome team!

How to prepare for a job interview at RemoteStar

✨Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description. Brush up on your Python skills, Kubernetes internals, and GPU management. Be ready to discuss specific projects where you've applied these skills.

✨Showcase Problem-Solving Skills

Prepare examples of complex problems you've solved in distributed systems. Highlight your debugging process and how you identified root causes, whether they were code-related or hardware issues. This will demonstrate your analytical thinking.

✨Familiarise with Infrastructure as Code

Since the role requires mastery of tools like Terraform and Ansible, be prepared to discuss your experience with these. Share specific instances where you’ve provisioned physical hardware using these tools, as this is crucial for the position.

✨Ask Insightful Questions

Interviews are a two-way street! Prepare thoughtful questions about the company’s approach to AI infrastructure, their tech stack, or team dynamics. This shows your genuine interest and helps you assess if it’s the right fit for you.

Senior System Engineer (Munich, Germany) in Cambourne
RemoteStar
Location: Cambourne

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>