At a Glance
- Tasks: Join us to deploy and support cutting-edge GPU interconnect platforms for AI and HPC workloads.
- Company: CoreWeave, a leader in GPU infrastructure with a focus on innovation.
- Benefits: Comprehensive health insurance, pension contributions, tuition reimbursement, and a supportive work culture.
- Other info: Exciting opportunities for growth in a dynamic, inclusive environment.
- Why this job: Be part of a team that drives technological disruption and scales global operations.
- Qualifications: Strong Linux skills, networking knowledge, and experience with automation tools like Python or Go.
The predicted salary is between 79000 - 105000 £ per year.
CoreWeave is building and operating some of the largest GPU infrastructure in the world. The Metal Net team owns the high‑bandwidth GPU interconnect platforms that make large‑scale AI and HPC workloads possible, including NVLink and NVSwitch‑based systems. We deploy, operate, troubleshoot, and improve these platforms across our global data centre footprint to provide a powerful alternative to traditional hyperscalers.
We are looking for an HPC Engineer to join our team to deploy, operate, and support NVLink/NVSwitch platforms across large data centre environments. This role is a strong fit for engineers who enjoy production troubleshooting, hardware‑adjacent systems work, automation, observability, and learning specialized infrastructure deeply. You will be responsible for troubleshooting Linux, networking, hardware, firmware, performance, and stability issues in production, while building automation to improve runbooks, dashboards, alerts, and lifecycle workflows. Additionally, you will participate in rotating on‑call shifts, lead incident responses, conduct root cause analyses, and collaborate cross‑functionally across CoreWeave to ensure reliable workflows scale effectively as our global fleet grows.
Who You Are:
- Strong Linux system administration and engineering troubleshooting skills.
- Solid grasp of networking fundamentals and common diagnostic/troubleshooting tools.
- Hands‑on production debugging experience using logs, metrics, and command‑line interfaces.
- Technical experience troubleshooting server, network, GPU, or data centre hardware.
- Practical scripting or automation experience using Python, Go, Bash, or similar languages.
- Clear written and verbal communication, documentation skills, and readiness to participate in an on‑call rotation.
- High curiosity to deeply learn specialized GPU interconnect technologies such as NVLink, NVSwitch, and InfiniBand.
Preferred:
- Experience with Ansible or other infrastructure‑as‑code and configuration automation tooling.
- Kubernetes application development or live platform operations experience.
- Familiarity with modern observability systems, including Grafana, Prometheus, PromQL, or similar stack components.
- Experience managing large fleet operations across Linux systems, network devices, GPUs, or infrastructure components.
- Deep understanding of InfiniBand, RDMA, HPC networking, or low‑latency/high‑bandwidth fabrics.
- Experience with BMC, Redfish, IPMI, firmware lifecycle management, or hardware management APIs.
- Exposure to NVLink, NVSwitch, NVIDIA GPU platforms, NVUE, SONiC, or specialized network operating systems.
Benefits:
- Family‑level Medical Insurance
- Family‑level Dental Insurance
- Generous Pension Contribution
- Life Assurance at 4x Salary
- Critical Illness Cover
- Employee Assistance Programme
- Tuition Reimbursement
- Work culture focused on innovative disruption
The base salary range for this role is £79,000 to £105,000. The starting salary will be determined based on job‑related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).
Equal Opportunity: CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
HPC Engineer, Metal Net employer: United States Digital Space LLC
CoreWeave is an exceptional employer for HPC Engineers, offering a dynamic work environment focused on innovative disruption within the tech industry. With a strong commitment to employee growth, we provide comprehensive benefits including family-level medical and dental insurance, generous pension contributions, and tuition reimbursement, all while fostering a culture of inclusivity and support. Join us in our global data centre operations where you can deepen your expertise in cutting-edge GPU technologies and contribute to transformative AI and HPC workloads.
Contact Details:
United States Digital Space LLC Recruitment Team
StudySmarter Expert Advice🤫
We think this is how you could land HPC Engineer, Metal Net
✨Join Local Tech Meetups
Get out there and mingle with fellow developers by joining local tech meetups. It’s a fantastic way to meet people who might be working at United States Digital Space LLC or know someone who does. Plus, you can pick up some trendy tech skills and trends while you're at it!
✨Contribute to Open Source Projects
Show off your coding chops by jumping into open-source projects. Not only does this give you practical experience, but it also gets you noticed in the dev community. You'll create a killer portfolio that speaks volumes about your skills to United States Digital Space LLC.
✨Tap into Online Developer Communities
Don’t underestimate the power of online developer communities like GitHub, Stack Overflow, and even Reddit. Participate in discussions, share your projects, and build your visibility. We can often find opportunities through these channels that can lead to a full-time gig at companies like United States Digital Space LLC.
✨Explore Job Boards Specifically for Tech Roles
Keep your eyes peeled on job boards that focus on tech roles. Sites like TechCareers or Stack Overflow Jobs can often have listings for companies like United States Digital Space LLC that might not show up on broader job sites. Make it a habit to check these regularly, and don’t hesitate to apply directly through our website!
We think you need these skills to ace HPC Engineer, Metal Net
Some tips for your application 🫡
Show off your coding skills:When applying for a software engineering role, it's super important to showcase your coding skills. Make sure your CV includes your tech stack, any relevant programming languages you’re comfortable with, and examples of projects you've worked on. If you have a GitHub profile, link it up! We love to see code in action.
Tailor your portfolio:For a full-time role, we’d expect to see some solid examples of your work in your portfolio. Make sure to include at least two or three projects that highlight your problem-solving skills and your ability to work with different technologies. Focus on the projects that are most relevant to the position at United States Digital Space LLC.
Craft a killer cover letter:Your cover letter is your chance to stand out—make it personal! Explain why you want to work at United States Digital Space LLC and how your skills align with the role. Show us your passion for software development. We dig enthusiastic candidates who understand the value of collaboration and continuous learning!
Be clear and concise:When it comes to writing your CV and cover letter, clarity is key. Avoid jargon that could confuse us and stick to simple, direct language. Highlight your achievements with quantifiable results where possible, and keep everything easy to read. A well-organised application goes a long way!
How to prepare for a job interview at United States Digital Space LLC
✨Brush Up on Your Coding Skills
For a full-time software engineering role, it's crucial that we stay sharp with our coding abilities. Expect technical questions that might involve solving problems on the spot or discussing algorithms. Practise on platforms like LeetCode or HackerRank to get comfortable with the types of questions that often come up.
✨Know Your Tools and Frameworks
Make sure we’re well-acquainted with the tools and technologies listed in the job description. Familiarise ourselves with any specific frameworks or programming languages mentioned. If United States Digital Space LLC uses React or Node.js, for instance, be ready to discuss how we’ve used them in previous projects or coursework.
✨Showcase Your Projects
Bring along a portfolio that highlights our best work. This could be code samples, GitHub repositories, or any side projects we’ve built. Make sure we can talk through our thought process for each project, especially the challenges we faced and how we solved them—this shows our problem-solving skills in action.
✨Prepare for Behavioural Questions
While technical skills are key, full-time positions also require cultural fit. Be ready to discuss our previous experiences and how we handle teamwork, conflict, and deadlines. Brush up on the STAR method—Situation, Task, Action, Result—to clearly articulate our past experiences when discussing how we've contributed to a team.