Job Board

Companies

asobbi

HPC Infrastructure and Support Engineer

HPC Infrastructure and Support Engineer in Liverpool

Liverpool Full-Time 36000 - 60000 £ / year (est.) No home office possible

At a Glance

Tasks: Maintain and optimise high-performance computing environments for cutting-edge AI solutions.
Company: Rapidly growing cloud provider redefining high-performance computing with innovative GPUaaS.
Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
Why this job: Join a dynamic team and make an impact in the AI and ML ecosystem.
Qualifications: Experience with HPC systems, Linux, and networking; strong problem-solving skills.
Other info: Collaborative environment with excellent career advancement opportunities.

The predicted salary is between 36000 - 60000 £ per year.

A rapidly growing cloud provider is redefining high-performance computing with cutting-edge GPUaaS, delivering scalable, enterprise-grade AI infrastructure at unmatched efficiency. With deep ties to Nvidia, they’re quickly becoming a powerhouse in the US and Europe’s AI and ML ecosystem, providing solutions for HPC, AI, and deep learning workloads.

As the Principal HPC Support Engineer, you will play a pivotal role in maintaining and supporting high-performance computing environments on bare-metal infrastructure, primarily serving clients in research, higher education, and enterprise AI sectors. You will focus on both the software and networking aspects of HPC deployments, ensuring that large-scale GPU clusters remain operational, secure, and optimized for client needs.

Key Responsibilities

System Maintenance and Performance Optimization

Manage, maintain, and tune bare-metal HPC clusters running Linux-based operating systems (e.g., Fedora, Debian, Ubuntu).
Optimize Nvidia GPU compute environments, including CUDA, NCCL, and GPU resource management in multi-node HPC clusters.
Oversee high-speed networking configurations, including InfiniBand (Mellanox), RDMA, and Ethernet fabric tuning for low-latency HPC workloads.
Configure and fine-tune HPC schedulers (e.g., Slurm, OpenPBS, SGE) for optimal GPU workload distribution.
Implement containerization strategies (Podman, Docker) and orchestration platforms (K3s, Kubernetes) for managing distributed AI/ML workloads.

Networking and Infrastructure Support

Configure, monitor, and troubleshoot high-performance network fabrics, ensuring low-latency, high-throughput communication between GPU nodes.
Deploy and maintain InfiniBand, RoCE, and high-speed Ethernet for HPC and AI clusters.
Collaborate with networking teams to optimize routing, switching, and load balancing for distributed computing environments.
Work closely with Nvidia engineers and system architects to implement GPUDirect Storage, NVLink, and Magnum IO for accelerated workloads.

Security, Automation, and Monitoring

Maintain authentication and authorization systems such as Active Directory, OpenLDAP, and Keycloak.
Automate system provisioning and configuration using Ansible, Terraform, or other Infrastructure-as-Code tools.
Monitor system performance using Prometheus, Grafana, and ELK Stack, identifying and resolving bottlenecks in GPU workloads.
Implement security best practices for multi-tenant HPC clusters, ensuring compliance with industry standards.

Troubleshooting and Client Support

Serve as the lead technical resource for diagnosing and resolving complex software, networking, and hardware issues in large-scale GPU clusters.
Analyze logs, conduct performance profiling, and debug CUDA, MPI, and RDMA-related issues.
Work closely with AI/ML research teams, cloud engineers, and enterprise clients to optimize workload performance.

Collaboration and Process Improvement

Support the ongoing development of internal HPC test environments and customer POCs.
Work cross-functionally with Service Desk, Operations, and Service Delivery Management to ensure seamless service.
Provide technical documentation, training, and mentorship to junior team members.

HPC Infrastructure and Support Engineer in Liverpool employer: asobbi

As a rapidly growing cloud provider at the forefront of high-performance computing, we offer an innovative work environment that fosters collaboration and creativity. Our commitment to employee growth is evident through continuous learning opportunities and mentorship, particularly in the dynamic fields of AI and machine learning. Located in a vibrant tech hub, we provide our team with access to cutting-edge technology and a culture that values diversity, inclusion, and the pursuit of excellence.

Contact Detail:

asobbi Recruiting Team

View asobbi Profile

StudySmarter Expert Advice 🤫

We think this is how you could land HPC Infrastructure and Support Engineer in Liverpool

✨Tip Number 1

Network, network, network! Get out there and connect with people in the HPC and AI sectors. Attend industry events, join relevant online forums, and don’t be shy about reaching out to professionals on LinkedIn. You never know who might have a lead on your dream job!

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects related to HPC, GPU optimisation, or any relevant tech. This gives potential employers a tangible look at what you can do and sets you apart from the crowd.

✨Tip Number 3

Prepare for technical interviews by brushing up on your knowledge of Linux systems, networking, and GPU management. Practice common interview questions and scenarios that relate to HPC environments. We want you to feel confident and ready to impress!

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive about their job search. Let’s get you into that role!

We think you need these skills to ace HPC Infrastructure and Support Engineer in Liverpool

HPC Cluster Management

Linux Operating Systems (Fedora, Debian, Ubuntu)

Nvidia GPU Optimization (CUDA, NCCL)

High-Speed Networking (InfiniBand, RDMA, Ethernet)

HPC Scheduler Configuration (Slurm, OpenPBS, SGE)

Containerization (Podman, Docker)

Orchestration Platforms (K3s, Kubernetes)

Network Fabric Configuration and Troubleshooting

Authentication and Authorization Systems (Active Directory, OpenLDAP, Keycloak)

Infrastructure-as-Code (Ansible, Terraform)

System Monitoring (Prometheus, Grafana, ELK Stack)

Performance Profiling and Debugging (CUDA, MPI, RDMA)

Technical Documentation and Mentorship

Collaboration with Cross-Functional Teams

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the HPC Infrastructure and Support Engineer role. Highlight your experience with Linux-based systems, GPU environments, and any relevant networking skills. We want to see how your background aligns with our needs!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about high-performance computing and how your skills can contribute to our mission. Keep it engaging and personal – we love to see your personality come through!

Showcase Relevant Projects: If you've worked on any projects related to HPC, AI, or deep learning, make sure to mention them in your application. We’re keen to see real-world examples of your expertise and how you’ve tackled challenges in similar environments.

Apply Through Our Website: We encourage you to apply directly through our website for the best chance of getting noticed. It’s super easy, and you’ll be able to keep track of your application status. Plus, we love seeing candidates who take that extra step!

How to prepare for a job interview at asobbi

✨Know Your HPC Stuff

Make sure you brush up on your knowledge of high-performance computing, especially around Linux-based systems and Nvidia GPU environments. Be ready to discuss specific tools like CUDA, Slurm, and InfiniBand, as well as any hands-on experience you've had with them.

✨Showcase Your Troubleshooting Skills

Prepare to share examples of how you've diagnosed and resolved complex issues in HPC environments. Think about specific challenges you've faced and how you approached them, especially regarding software, networking, and hardware problems.

✨Demonstrate Collaboration

This role involves working closely with various teams, so be ready to talk about your experience collaborating with others. Highlight any cross-functional projects you've been part of and how you contributed to their success.

✨Ask Smart Questions

At the end of the interview, don’t forget to ask insightful questions about the company's HPC strategies or their approach to AI workloads. This shows your genuine interest in the role and helps you gauge if it's the right fit for you.

HPC Infrastructure and Support Engineer in Liverpool

asobbi

Location: Liverpool

HPC Infrastructure and Support Engineer in Liverpool

At a Glance

HPC Infrastructure and Support Engineer in Liverpool employer: asobbi

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace HPC Infrastructure and Support Engineer in Liverpool

Some tips for your application 🫡

How to prepare for a job interview at asobbi

HPC Infrastructure and Support Engineer in Liverpool

Land your dream job quicker with Premium