Senior HPC AI Cluster Engineer
Senior HPC AI Cluster Engineer

Senior HPC AI Cluster Engineer

Full-Time 48000 - 72000 £ / year (est.) No home office possible
Go Premium
N

At a Glance

  • Tasks: Design and maintain cutting-edge HPC/AI clusters while collaborating with top researchers.
  • Company: NVIDIA is a leader in AI and computing technology, redefining the future of innovation.
  • Benefits: Enjoy competitive salaries, extensive benefits, and a diverse, inclusive work environment.
  • Why this job: Be part of groundbreaking projects that push the limits of technology and AI.
  • Qualifications: Bachelor's degree or equivalent experience with 5+ years in HPC and AI technologies required.
  • Other info: Join a team of innovative professionals dedicated to transforming the tech landscape.

The predicted salary is between 48000 - 72000 £ per year.

Join to apply for the Senior HPC AI Cluster Engineer role at NVIDIA

Join to apply for the Senior HPC AI Cluster Engineer role at NVIDIA

Get AI-powered advice on this job and more exclusive features.

NVIDIA is looking for an experienced HPC Engineer to join the E2E software verification HPC/AI Infrastructure team. We are building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for an outstanding architect for a senior HPC, be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Provide insights on at-scale system design and tuning mechanisms for large-scale compute runs.
You will work with the latest Accelerated computing and Deep Learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms. Does this sound like you? If so, we would love to hear from you!
What You Will Be Doing

  • Designing, implementing and maintaining large scale HPC/AI clusters with monitoring, logging and alerting
  • Managing Linux job/workload schedules and orchestration tools
  • Developing and maintaining continuous integration and delivery pipelines
  • Developing tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources
  • Deploying monitoring solutions for the servers, network and storage
  • Troubleshooting and fixing, bottom up from bare metal, operating system, software stack and application level
  • Being a technical resource, developing, re-defining and documenting standard methodologies to share with internal teams
  • Supporting Research & Development activities and engaging in POCs/POVs for future improvements

What We Need To See

  • Bachelor\’s Degree in Computer Science, Engineering, or a related field; or equivalent experience
  • 5+ years of experience
  • Knowledge of HPC and AI solution technologies from CPU’s and GPU’s to high speed interconnects and supporting software
  • Experience with job scheduling workloads and orchestration tools such as Slurm, K8s
  • Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.
  • Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs. Familiarity with newer and emerging storage technologies.
  • Python programming and bash scripting experience.
  • Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/chef
  • Deep knowledge of Networking Protocols like InfiniBand, Ethernet
  • Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix)
  • Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)

Ways To Stand Out From The Crowd

  • Knowledge of CPU and/or GPU architecture
  • Knowledge of Kubernetes, container related microservice technologies
  • Experience with GPU-focused hardware/software (DGX, Cuda)
  • Background with RDMA (InfiniBand or RoCE) fabrics

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. We have a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. Our teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology. We offer highly competitive salaries, an extensive benefits package, and a work environment that promotes diversity, inclusion, and flexibility. As an equal opportunity employer, we are committed to fostering a supportive and empowering workplace for all
JR1997883

Seniority level

  • Seniority level

    Mid-Senior level

Employment type

  • Employment type

    Full-time

Job function

  • Job function

    Engineering and Information Technology

  • Industries

    Computer Hardware Manufacturing, Software Development, and Computers and Electronics Manufacturing

Referrals increase your chances of interviewing at NVIDIA by 2x

Get notified about new Artificial Intelligence Engineer jobs in United Kingdom .

London, England, United Kingdom 5 days ago

Software Engineer (Python) – AI Platform

London, England, United Kingdom 1 month ago

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 3 months ago

Manchester, England, United Kingdom 1 week ago

United Kingdom $100,000.00-$150,000.00 2 months ago

Edinburgh, Scotland, United Kingdom 2 days ago

Data Engineer (open to the UK and Europe)

United Kingdom $120,000.00-$120,000.00 1 week ago

Multiple Data Engineers/Scientists/ML Engineers needed – LONDON

Data Engineer (open to the UK and Europe)

Greater London, England, United Kingdom 3 weeks ago

London, England, United Kingdom 5 months ago

London, England, United Kingdom 1 month ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr

Senior HPC AI Cluster Engineer employer: Nvidia

NVIDIA is an exceptional employer, offering a dynamic work environment in London where innovation thrives. With a strong focus on employee growth, we provide access to cutting-edge technologies and collaborative projects that empower our team to push the boundaries of AI and HPC. Our commitment to diversity, inclusion, and competitive benefits ensures that every employee feels valued and supported in their career journey.
N

Contact Detail:

Nvidia Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Senior HPC AI Cluster Engineer

✨Tip Number 1

Network with professionals in the HPC and AI fields. Attend industry conferences, webinars, or local meetups to connect with people who work at NVIDIA or similar companies. This can give you insider knowledge about the company culture and potentially lead to referrals.

✨Tip Number 2

Showcase your hands-on experience with HPC and AI technologies. If you've worked on relevant projects, be prepared to discuss them in detail during interviews. Highlight specific challenges you faced and how you overcame them, as this demonstrates your problem-solving skills.

✨Tip Number 3

Familiarise yourself with NVIDIA's products and recent advancements in AI and HPC. Understanding their technology stack and how they apply it in real-world scenarios will help you articulate how you can contribute to their goals during discussions.

✨Tip Number 4

Prepare for technical interviews by brushing up on your knowledge of job scheduling tools like Slurm and Kubernetes. Be ready to demonstrate your understanding of these systems and how they relate to large-scale infrastructure management.

We think you need these skills to ace Senior HPC AI Cluster Engineer

HPC and AI solution technologies
Job scheduling and orchestration tools (e.g., Slurm, K8s)
Linux (Redhat/CentOS and Ubuntu) expertise
Networking protocols (TCP, DHCP, DNS, InfiniBand, Ethernet)
Storage solutions (Lustre, GPFS, zfs, xfs)
Python programming and bash scripting
Automation and configuration management tools (Jenkins, Ansible, Puppet/Chef)
Virtual systems experience (VMware, Hyper-V, KVM, Citrix)
Cloud computing platforms familiarity (AWS, Azure, Google Cloud)
Deep understanding of GPU-focused hardware/software (DGX, CUDA)
Knowledge of CPU and/or GPU architecture
Experience with RDMA fabrics (InfiniBand or RoCE)

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights relevant experience in HPC and AI technologies. Focus on specific projects where you've designed or maintained large-scale clusters, and mention any tools or programming languages you used.

Craft a Compelling Cover Letter: In your cover letter, express your passion for HPC and AI. Discuss how your background aligns with NVIDIA's mission and the specific role. Mention any unique contributions you can bring to their team.

Showcase Technical Skills: Clearly list your technical skills related to job scheduling, orchestration tools, and programming languages like Python and Bash. Provide examples of how you've applied these skills in previous roles.

Highlight Collaborative Experience: NVIDIA values teamwork, so include examples of how you've worked with cross-functional teams. Describe any collaborative projects that involved researchers or developers, showcasing your ability to communicate complex ideas effectively.

How to prepare for a job interview at Nvidia

✨Showcase Your Technical Expertise

Be prepared to discuss your experience with HPC and AI technologies in detail. Highlight specific projects where you've designed or maintained large-scale clusters, and be ready to explain the challenges you faced and how you overcame them.

✨Demonstrate Problem-Solving Skills

Expect technical questions that assess your troubleshooting abilities. Prepare examples of how you've resolved issues from bare metal to application level, and be ready to walk through your thought process during these situations.

✨Familiarise Yourself with Their Tools

Research the specific job scheduling and orchestration tools mentioned in the job description, such as Slurm and Kubernetes. Understanding these tools will allow you to speak confidently about your experience and how you can contribute to their team.

✨Engage with Their Vision

NVIDIA is at the forefront of AI and GPU computing. Show your enthusiasm for their mission by discussing how your skills align with their goals. Mention any relevant experience with AI breakthroughs or innovative projects that resonate with their vision.

Senior HPC AI Cluster Engineer
Nvidia
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

N
  • Senior HPC AI Cluster Engineer

    Full-Time
    48000 - 72000 £ / year (est.)

    Application deadline: 2027-08-18

  • N

    Nvidia

Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>