HPC Engineer, Metal Net in London

HPC Engineer, Metal Net in London

London Full-Time 79000 - 105000 £ / year (est.) No working from home possible
United States Digital Space LLC

At a Glance

  • Tasks: Deploy and support cutting-edge GPU interconnect platforms in large data centres.
  • Company: CoreWeave, a leader in GPU infrastructure with a focus on innovation.
  • Benefits: Comprehensive health insurance, pension contributions, tuition reimbursement, and a supportive work culture.
  • Other info: Opportunity for career growth in a collaborative and inclusive environment.
  • Why this job: Join a dynamic team and work with advanced technologies that shape the future of AI and HPC.
  • Qualifications: Strong Linux skills, networking knowledge, and experience with automation scripting.

The predicted salary is between 79000 - 105000 £ per year.

CoreWeave is building and operating some of the largest GPU infrastructure in the world. The Metal Net team owns the high‑bandwidth GPU interconnect platforms that make large‑scale AI and HPC workloads possible, including NVLink and NVSwitch‑based systems. We deploy, operate, troubleshoot, and improve these platforms across our global data centre footprint to provide a powerful alternative to traditional hyperscalers.

We are looking for an HPC Engineer to join our team to deploy, operate, and support NVLink/NVSwitch platforms across large data centre environments. This role is a strong fit for engineers who enjoy production troubleshooting, hardware‑adjacent systems work, automation, observability, and learning specialized infrastructure deeply. You will be responsible for troubleshooting Linux, networking, hardware, firmware, performance, and stability issues in production, while building automation to improve runbooks, dashboards, alerts, and lifecycle workflows. Additionally, you will participate in rotating on‑call shifts, lead incident responses, conduct root cause analyses, and collaborate cross‑functionally across CoreWeave to ensure reliable workflows scale effectively as our global fleet grows.

Who You Are:

  • Strong Linux system administration and engineering troubleshooting skills.
  • Solid grasp of networking fundamentals and common diagnostic/troubleshooting tools.
  • Hands‑on production debugging experience using logs, metrics, and command‑line interfaces.
  • Technical experience troubleshooting server, network, GPU, or data centre hardware.
  • Practical scripting or automation experience using Python, Go, Bash, or similar languages.
  • Clear written and verbal communication, documentation skills, and readiness to participate in an on‑call rotation.
  • High curiosity to deeply learn specialized GPU interconnect technologies such as NVLink, NVSwitch, and InfiniBand.

Preferred:

  • Experience with Ansible or other infrastructure‑as‑code and configuration automation tooling.
  • Kubernetes application development or live platform operations experience.
  • Familiarity with modern observability systems, including Grafana, Prometheus, PromQL, or similar stack components.
  • Experience managing large fleet operations across Linux systems, network devices, GPUs, or infrastructure components.
  • Deep understanding of InfiniBand, RDMA, HPC networking, or low‑latency/high‑bandwidth fabrics.
  • Experience with BMC, Redfish, IPMI, firmware lifecycle management, or hardware management APIs.
  • Exposure to NVLink, NVSwitch, NVIDIA GPU platforms, NVUE, SONiC, or specialized network operating systems.

Benefits:

  • Family‑level Medical Insurance
  • Family‑level Dental Insurance
  • Generous Pension Contribution
  • Life Assurance at 4x Salary
  • Critical Illness Cover
  • Employee Assistance Programme
  • Tuition Reimbursement
  • Work culture focused on innovative disruption

The base salary range for this role is £79,000 to £105,000. The starting salary will be determined based on job‑related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).

Equal Opportunity: CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.

HPC Engineer, Metal Net in London employer: United States Digital Space LLC

CoreWeave is an exceptional employer for HPC Engineers, offering a dynamic work environment focused on innovative disruption within the tech industry. With a strong commitment to employee growth, we provide comprehensive benefits including family-level medical and dental insurance, generous pension contributions, and tuition reimbursement, all while fostering a culture of inclusivity and support. Join us in our global data centre operations where you can deepen your expertise in cutting-edge GPU technologies and contribute to transformative AI and HPC workloads.

United States Digital Space LLC

Contact Details:

United States Digital Space LLC Recruitment Team

We think you need these skills to ace HPC Engineer, Metal Net in London

Linux System Administration
Troubleshooting Skills
Networking Fundamentals
Production Debugging
Scripting or Automation (Python, Go, Bash)
Clear Communication Skills
Incident Response