At a Glance
- Tasks: Deploy and troubleshoot cutting-edge GPU platforms in a global data centre environment.
- Company: Join CoreWeave, the essential cloud for AI, trusted by innovators worldwide.
- Benefits: Enjoy competitive salary, comprehensive health insurance, and generous pension contributions.
- Other info: Flexible hybrid work culture with excellent career growth opportunities.
- Why this job: Make an impact in AI infrastructure while learning specialised technologies like NVLink and NVSwitch.
- Qualifications: Strong Linux skills and a passion for troubleshooting hardware and networking.
The predicted salary is between 79000 - 105000 € per year.
CoreWeave is building and operating some of the largest GPU infrastructure in the world. The Metal Net team owns the high-bandwidth GPU interconnect platforms that make large-scale AI and HPC workloads possible, including NVLink and NVSwitch-based systems. We are looking for an HPC Engineer to deploy, operate, troubleshoot, and improve these platforms across our global data center footprint. This role is a strong fit for engineers who enjoy production troubleshooting, hardware-adjacent systems work, automation, observability, and learning specialized infrastructure deeply. Prior NVLink experience is helpful, but not required.
What You Will Do
- Deploy, operate, and support NVLink/NVSwitch platforms across large data center environments.
- Troubleshoot Linux, networking, hardware, firmware, performance, and stability issues in production.
- Build automation and improve runbooks, dashboards, alerts, and lifecycle workflows.
- Collaborate with teams across CoreWeave, external vendors, and customer-facing stakeholders.
- Drive assigned work to completion with clear communication, thoughtful prioritization, and early visibility into risks or blockers.
- Participate in on-call, incident response, root cause analysis, and follow-up improvements.
- Contribute to reliable workflows that scale across regions, platforms, and fleet growth, with ownership calibrated by level.
What We Are Looking For
- Strong Linux system administration and troubleshooting skills.
- Networking fundamentals and common troubleshooting tools.
- Production debugging experience using logs, metrics, and command-line tools.
- Server, network, GPU, or data center hardware troubleshooting experience.
- Practical scripting or automation experience in Python, Go, Bash, or similar.
- Clear communication, documentation, collaboration, and on-call readiness.
- Curiosity to learn specialized GPU interconnect technologies such as NVLink, NVSwitch, and InfiniBand.
Preferred Qualifications
- Ansible or other infrastructure automation tooling.
- Kubernetes application development or operations experience.
- Grafana, Prometheus, PromQL, or similar observability systems.
- Large fleet operations across Linux systems, network devices, GPUs, or infrastructure components.
- InfiniBand, RDMA, HPC networking, or low-latency/high-bandwidth fabrics.
- BMC, Redfish, IPMI, firmware lifecycle management, or hardware management APIs.
- NVLink, NVSwitch, NVIDIA GPU platforms, NVUE, SONiC, or network operating systems.
The base salary range for this role is £79,000 to £105,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).
What We Offer
In addition to a competitive salary, we offer a variety of benefits to support your needs, including:
- Family-level Medical Insurance
- Family-level Dental Insurance
- Generous Pension Contribution
- Life Assurance at 4x Salary
- Critical Illness Cover
- Employee Assistance Programme
- Tuition Reimbursement
- Work culture focused on innovative disruption
Benefits may vary by location.
Our Workplace While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
HPC Engineer, Metal Net in London employer: CoreWeave Europe
CoreWeave is an exceptional employer, offering a dynamic work culture that prioritises innovation and collaboration within the AI and HPC sectors. Employees benefit from a comprehensive rewards package, including generous medical and dental insurance, pension contributions, and opportunities for professional growth through tuition reimbursement. With a commitment to inclusivity and a hybrid work environment, CoreWeave fosters a supportive atmosphere where engineers can thrive and contribute to cutting-edge technology in a rapidly evolving industry.
StudySmarter Expert Advice🤫
We think this is how you could land HPC Engineer, Metal Net in London
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, especially those at CoreWeave. A friendly chat can open doors that a CV just can't.
✨Tip Number 2
Show off your skills! If you’ve got experience with NVLink or similar tech, make sure to highlight it in conversations. Practical examples of your work can really impress.
✨Tip Number 3
Be ready for technical discussions! Brush up on your Linux and networking knowledge. We love candidates who can dive deep into troubleshooting and automation.
✨Tip Number 4
Apply through our website! It’s the best way to ensure your application gets seen. Plus, we’re always looking for passionate engineers to join our team.
We think you need these skills to ace HPC Engineer, Metal Net in London
Some tips for your application 🫡
Tailor Your CV:Make sure your CV is tailored to the HPC Engineer role. Highlight your Linux system administration skills and any relevant experience with GPU infrastructure. We want to see how your background aligns with what we do at CoreWeave!
Show Off Your Troubleshooting Skills:In your application, don’t shy away from showcasing your troubleshooting experience. Whether it’s Linux, networking, or hardware issues, let us know how you’ve tackled challenges in production environments.
Be Clear and Concise:When writing your cover letter, keep it clear and to the point. We appreciate straightforward communication, so make sure to express your enthusiasm for the role and how you can contribute to our team.
Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy!
How to prepare for a job interview at CoreWeave Europe
✨Know Your Tech
Make sure you brush up on your Linux system administration skills and be ready to discuss troubleshooting techniques. Familiarise yourself with NVLink and NVSwitch technologies, even if you don't have direct experience. Showing curiosity and a willingness to learn can really impress the interviewers.
✨Showcase Your Problem-Solving Skills
Prepare to share specific examples of how you've tackled production issues in the past. Use the STAR method (Situation, Task, Action, Result) to structure your answers. This will help demonstrate your ability to troubleshoot effectively and think critically under pressure.
✨Communicate Clearly
Effective communication is key, especially when collaborating with teams or dealing with customer-facing stakeholders. Practice explaining complex technical concepts in simple terms. This will show that you can bridge the gap between technical and non-technical audiences.
✨Be Ready for Automation Talk
Since automation is a big part of the role, be prepared to discuss your experience with scripting or automation tools like Python, Go, or Bash. Bring examples of how you've used these skills to improve workflows or processes in previous roles.