Job Board

Companies

NVIDIA Corporation

Senior HPC AI Cluster Engineer

Full-Time 48000 - 84000 £ / year (est.) No home office possible

At a Glance

Tasks: Design and maintain large-scale HPC/AI clusters while managing workloads and automating processes.
Company: NVIDIA is a leader in computer graphics and AI, driving innovation for over 25 years.
Benefits: Enjoy competitive salaries, extensive benefits, and a flexible, inclusive work environment.
Why this job: Join a team pushing the boundaries of technology and contributing to groundbreaking AI advancements.
Qualifications: Bachelor's degree or equivalent experience with 5+ years in HPC and AI technologies required.
Other info: Opportunity to work with cutting-edge hardware and collaborate with top researchers and developers.

The predicted salary is between 48000 - 84000 £ per year.

NVIDIA is looking for an experienced HPC Engineer to join the E2E software verification HPC/AI Infrastructure team. We are building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for an outstanding architect for a senior HPC, be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Provide insights on at-scale system design and tuning mechanisms for large-scale compute runs.
You will work with the latest Accelerated computing and Deep Learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms. Does this sound like you? If so, we would love to hear from you!
What you will be doing:

Designing, implementing and maintaining large scale HPC/AI clusters with monitoring, logging and alerting
Managing Linux job/workload schedules and orchestration tools
Developing and maintaining continuous integration and delivery pipelines
Developing tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources
Deploying monitoring solutions for the servers, network and storage
Troubleshooting and fixing, bottom up from bare metal, operating system, software stack and application level
Being a technical resource, developing, re-defining and documenting standard methodologies to share with internal teams
Supporting Research & Development activities and engaging in POCs/POVs for future improvements

What we need to see:

Bachelor\’s Degree in Computer Science, Engineering, or a related field; or equivalent experience
5+ years of experience
Knowledge of HPC and AI solution technologies from CPU\’s and GPU\’s to high speed interconnects and supporting software
Experience with job scheduling workloads and orchestration tools such as Slurm, K8s
Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.
Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs. Familiarity with newer and emerging storage technologies.
Python programming and bash scripting experience.
Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/chef
Deep knowledge of Networking Protocols like InfiniBand, Ethernet
Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix)
Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)

Ways to stand out from the crowd:

Knowledge of CPU and/or GPU architecture
Knowledge of Kubernetes, container related microservice technologies
Experience with GPU-focused hardware/software (DGX, Cuda)
Background with RDMA (InfiniBand or RoCE) fabrics

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. We have a unique legacy of innovation that\’s fueled by great technology-and amazing people. Today, we\’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what\’s never been done before takes vision, innovation, and the world\’s best talent. Our teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology. We offer highly competitive salaries, an extensive benefits package, and a work environment that promotes diversity, inclusion, and flexibility. As an equal opportunity employer, we are committed to fostering a supportive and empowering workplace for all #J-18808-Ljbffr

Senior HPC AI Cluster Engineer employer: NVIDIA Corporation

NVIDIA is an exceptional employer for those passionate about cutting-edge technology and innovation in the HPC and AI sectors. With a commitment to diversity, inclusion, and employee growth, we offer competitive salaries and an extensive benefits package, all within a collaborative work culture that encourages creativity and professional development. Join us in redefining the future of computing while working alongside some of the brightest minds in the industry.

Contact Detail:

NVIDIA Corporation Recruiting Team

View NVIDIA Corporation Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Senior HPC AI Cluster Engineer

✨Tip Number 1

Familiarise yourself with the latest HPC and AI technologies, especially those related to NVIDIA's offerings. Understanding their GPU architecture and how it integrates with AI workloads will give you a significant edge during discussions.

✨Tip Number 2

Engage with the HPC community through forums, webinars, and conferences. Networking with professionals in the field can provide insights into current trends and challenges, which you can leverage in your conversations with us.

✨Tip Number 3

Showcase your hands-on experience with job scheduling tools like Slurm and orchestration platforms such as Kubernetes. Being able to discuss specific projects where you've implemented these tools will demonstrate your practical knowledge.

✨Tip Number 4

Prepare to discuss your experience with automation and configuration management tools like Jenkins and Ansible. Highlighting how you've used these tools to streamline processes in previous roles will resonate well with our team.

We think you need these skills to ace Senior HPC AI Cluster Engineer

HPC Cluster Design

AI Solution Technologies

Job Scheduling and Orchestration Tools (e.g., Slurm, Kubernetes)

Linux System Administration (Redhat/CentOS, Ubuntu)

Networking Protocols (TCP, DHCP, DNS, InfiniBand, Ethernet)

Storage Solutions (Lustre, GPFS, ZFS, XFS)

Python Programming

Bash Scripting

Automation and Configuration Management (Jenkins, Ansible, Puppet, Chef)

Virtual Systems Management (VMware, Hyper-V, KVM, Citrix)

Cloud Computing Platforms (AWS, Azure, Google Cloud)

GPU Architecture Knowledge

Container Technologies

RDMA Fabric Experience

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights relevant experience in HPC and AI technologies. Focus on your achievements in designing and maintaining large-scale clusters, as well as any specific tools or programming languages mentioned in the job description.

Craft a Compelling Cover Letter: In your cover letter, express your passion for HPC and AI. Mention specific projects or experiences that align with NVIDIA's goals, and explain how your skills can contribute to their innovative work in supercomputing.

Showcase Technical Skills: Clearly outline your technical skills related to job scheduling, orchestration tools, and programming languages like Python and bash scripting. Provide examples of how you've used these skills in previous roles to solve complex problems.

Highlight Collaborative Experience: Since the role involves working with researchers and developers, emphasise any collaborative projects you've been part of. Discuss how you contributed to team success and improved workflows, showcasing your ability to work in a multidisciplinary environment.

How to prepare for a job interview at NVIDIA Corporation

✨Showcase Your Technical Expertise

Be prepared to discuss your experience with HPC and AI technologies in detail. Highlight specific projects where you've designed or maintained large-scale clusters, and be ready to explain the challenges you faced and how you overcame them.

✨Demonstrate Problem-Solving Skills

Expect technical questions that assess your troubleshooting abilities. Prepare examples of how you've resolved issues from bare metal to application level, showcasing your systematic approach to problem-solving.

✨Familiarise Yourself with Relevant Tools

Make sure you know the orchestration tools and job scheduling systems mentioned in the job description, such as Slurm and Kubernetes. Being able to discuss your hands-on experience with these tools will set you apart.

✨Engage with the Interviewers

During the interview, ask insightful questions about the team’s current projects and future goals. This shows your genuine interest in the role and helps you understand how you can contribute to their success.

Senior HPC AI Cluster Engineer

NVIDIA Corporation

Senior HPC AI Cluster Engineer

Full-Time

48000 - 84000 £ / year (est.)
NVIDIA Corporation

5000-10000

View NVIDIA Corporation Profile

Similar positions in other companies

UK’s top job board for Gen Z

Discover now

Senior HPC AI Cluster Engineer

At a Glance

Senior HPC AI Cluster Engineer employer: NVIDIA Corporation

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Senior HPC AI Cluster Engineer

Some tips for your application 🫡

How to prepare for a job interview at NVIDIA Corporation

Senior HPC AI Cluster Engineer

Land your dream job quicker with Premium

Similar positions in other companies

UK’s top job board for Gen Z