HPC Cluster Architect

Job Board

Companies

NexGen Cloud

HPC Cluster Architect

Full-Time 80000 - 100000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Design and own large-scale GPU cluster architecture from concept to deployment.
Company: Join NexGen Cloud, a leader in AI cloud infrastructure.
Benefits: Competitive salary, flexible working, 25 days holiday, and growth opportunities.
Other info: Collaborative culture with autonomy and trust to innovate.
Why this job: Make a real impact in cutting-edge AI technology and shape the future of cloud infrastructure.
Qualifications: Experience in GPU-based HPC cluster design and strong technical leadership skills.

The predicted salary is between 80000 - 100000 £ per year.

NexGen Cloud is the company behind Hyperstack, a full-stack AI cloud serving tens of thousands of customers from AI researchers to enterprises running the world's most compute-intensive workloads. We deliver on-demand and private GPU infrastructure to teams who treat performance as a requirement, not a feature. We're a tight-knit, fast-moving team working at the cutting edge of AI cloud infrastructure. We practice what we preach, equipping our people with AI at every level so we can solve harder problems, ship faster, and keep raising the bar for what enterprise GPU infrastructure looks like.

This role exists because NexGen Cloud is winning large-scale dedicated GPU cluster contracts and needs someone who can own the full architecture cycle — from first customer conversation to production deployment. You’ll have direct ownership over cluster architecture across compute, networking, storage, and physical design — translating customer requirements into production-ready, commercially optimised GPU deployments.

Role positioning: This is a senior hands-on role for someone who has lived and breathed HPC cluster design — and who wants to be the technical authority, not one voice in a committee. You’ll own designs end-to-end and see them go live.

WHAT YOU’LL BE DOING

Own end-to-end cluster architecture for large-scale NVIDIA GPU deployments — from customer requirement through rack layouts, BOM, power and cooling design, to production handover.
Design high-performance network fabrics across compute (InfiniBand, RDMA, NVLink/NVSwitch), storage, and WAN — defining topology, oversubscription models, and scaling strategies.
Engage directly with OEMs and vendors — validating hardware configurations, reviewing quotes, and ensuring designs are both technically sound and commercially optimised.
Provide technical oversight during deployment and bring-up — supporting hardware validation, performance testing, and acting as escalation point for complex integration issues.
Act as a senior technical leader across Solutions Architecture, Cloud Engineering, and data centre partners — contributing to standardised reference designs and building out the HPC engineering function.

ABOUT YOU: We’re more interested in how you think and work than in a perfect CV. You’ll likely bring a combination of the following:

Proven experience designing and delivering GPU-based HPC or AI clusters at scale — covering the full lifecycle from design through procurement, deployment, and validation.
Deep hands-on knowledge of NVIDIA GPU platforms (H100/H200/B-series) and NVIDIA reference architectures.
Strong InfiniBand/RDMA design experience — topology, performance tuning, and high-performance Ethernet fabrics.
Solid grounding in Linux systems, PCIe topology, NUMA alignment, and server-level performance considerations.
Background from an OEM, hyperscaler, neo-cloud, or enterprise/research HPC environment — with demonstrable exposure to the full design-to-deployment lifecycle.
Confident engaging with customers, vendors, OEMs, and internal engineering teams as a technical authority — able to translate complex design trade-offs into clear decisions.

Nice to Have

Experience with Spectrum-X or next-generation Ethernet fabrics.
Prior involvement in large-scale cluster deployments (1,000+ GPUs) and performance benchmarking (NCCL, MLPerf).
Exposure to both air-cooled and liquid-cooled HPC environments, and/or automation/infrastructure-as-code.

WHAT WE OFFER

Competitive salary and annual discretionary bonus scheme.
25 days of holiday, plus public holidays.
Flexible working arrangements (remote or hybrid, depending on role and location).
Real ownership and autonomy, with the trust to take initiative and experiment.
The opportunity to make a visible, meaningful impact as we scale.
Clear career progression and growth opportunities in a fast-growing company.
A collaborative, international culture built on trust, transparency, and ownership.
The chance to help shape NexGen Cloud’s team, culture, and future alongside ambitious, mission-driven colleagues.

HPC Cluster Architect employer: NexGen Cloud

NexGen Cloud is an exceptional employer for those seeking to make a significant impact in the AI cloud infrastructure space. With a collaborative and transparent work culture, employees enjoy real ownership over their projects, flexible working arrangements, and clear pathways for career progression. As part of a fast-growing team, you'll have the opportunity to work at the cutting edge of technology while being supported by ambitious colleagues who share a mission-driven mindset.

Contact Details:

NexGen Cloud Recruitment Team

View NexGen Cloud profile

StudySmarter Expert Advice🤫

We think this is how you could land HPC Cluster Architect

✨Tip Number 1

Get your networking game on! Reach out to folks in the HPC and AI space, especially those at NexGen Cloud. A friendly chat can open doors and give you insights that a job description just can't.

✨Tip Number 2

Show off your expertise! When you get the chance to speak with hiring managers or during interviews, share specific examples of your past projects. Highlight how you've tackled challenges in GPU deployments and cluster architecture.

✨Tip Number 3

Don’t just wait for job openings to pop up. Keep an eye on our website and apply proactively. Sometimes, the best roles are created for the right talent, and that could be you!

✨Tip Number 4

Be ready to dive deep into technical discussions. Brush up on your knowledge of NVIDIA platforms and InfiniBand design. The more confident you are in these areas, the better you'll stand out as a candidate.

We think you need these skills to ace HPC Cluster Architect

HPC Cluster Design

NVIDIA GPU Platforms (H100/H200/B-series)

InfiniBand/RDMA Design

High-Performance Network Fabrics

Linux Systems

PCIe Topology

NUMA Alignment

Performance Tuning

Customer Engagement

Technical Oversight

Deployment and Validation

Complex Integration Issues

Reference Architectures

Cluster Deployment Experience

Performance Benchmarking (NCCL, MLPerf)

Some tips for your application 🫡

Tailor Your Application:Make sure to customise your CV and cover letter for the HPC Cluster Architect role. Highlight your experience with GPU-based HPC clusters and any relevant projects you've worked on. We want to see how your skills align with what we're looking for!

Showcase Your Technical Expertise:Don’t hold back on detailing your hands-on knowledge of NVIDIA GPU platforms and InfiniBand/RDMA design. We’re keen to see your technical authority shine through, so share specific examples of your past work that demonstrate your capabilities.

Be Clear and Concise:When writing your application, keep it clear and to the point. Use bullet points where possible to make it easy for us to read. We appreciate a well-structured application that gets straight to the heart of your qualifications.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re serious about joining our team at NexGen Cloud!

How to prepare for a job interview at NexGen Cloud

✨Know Your HPC Stuff

Make sure you brush up on your knowledge of GPU-based HPC and AI clusters. Be ready to discuss your past experiences in designing and delivering these systems, especially focusing on the full lifecycle from design to deployment. They want to see that you can own the architecture cycle, so be prepared to share specific examples.

✨Get Technical with Networking

Since this role involves designing high-performance network fabrics, it’s crucial to understand InfiniBand, RDMA, and NVLink/NVSwitch. Familiarise yourself with different topologies and scaling strategies. You might be asked to explain how you would approach a specific networking challenge, so have some scenarios in mind.

✨Engage Like a Pro

You’ll need to demonstrate your ability to engage with customers, vendors, and internal teams confidently. Prepare to discuss how you’ve translated complex design trade-offs into clear decisions in the past. Think about times when you acted as a technical authority and how you handled those situations.

✨Show Your Leadership Skills

This is a senior role, so they’ll be looking for evidence of your leadership capabilities. Be ready to talk about how you’ve contributed to standardised reference designs or built out engineering functions in previous roles. Highlight any experience you have in mentoring or guiding teams through complex projects.

HPC Cluster Architect

NexGen Cloud

Apply Now

HPC Cluster Architect

At a Glance

HPC Cluster Architect employer: NexGen Cloud

StudySmarter Expert Advice🤫

We think you need these skills to ace HPC Cluster Architect

Some tips for your application 🫡

How to prepare for a job interview at NexGen Cloud

Company

Product

Help