HPC Infrastructure and Support Engineer in Reading

HPC Infrastructure and Support Engineer in Reading

Reading Full-Time 36000 - 60000 £ / year (est.) No working from home possible
asobbi

At a Glance

  • Tasks: Maintain and optimise high-performance computing environments for cutting-edge AI solutions.
  • Company: Rapidly growing cloud provider redefining high-performance computing with innovative GPUaaS.
  • Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
  • Other info: Dynamic team environment with excellent career advancement opportunities.
  • Why this job: Join a powerhouse in AI and ML, making a real impact in tech innovation.
  • Qualifications: Experience with HPC systems, Linux, and networking; strong problem-solving skills.

The predicted salary is between 36000 - 60000 £ per year.

A rapidly growing cloud provider is redefining high-performance computing with cutting-edge GPUaaS, delivering scalable, enterprise-grade AI infrastructure at unmatched efficiency. With deep ties to Nvidia, they’re quickly becoming a powerhouse in the US and Europe’s AI and ML ecosystem, providing solutions for HPC, AI, and deep learning workloads.

As the Principal HPC Support Engineer, you will play a pivotal role in maintaining and supporting high-performance computing environments on bare-metal infrastructure, primarily serving clients in research, higher education, and enterprise AI sectors. You will focus on both the software and networking aspects of HPC deployments, ensuring that large-scale GPU clusters remain operational, secure, and optimized for client needs.

Key Responsibilities
  • System Maintenance and Performance Optimization
    • Manage, maintain, and tune bare-metal HPC clusters running Linux-based operating systems (e.g., Fedora, Debian, Ubuntu).
    • Optimize Nvidia GPU compute environments, including CUDA, NCCL, and GPU resource management in multi-node HPC clusters.
    • Oversee high-speed networking configurations, including InfiniBand (Mellanox), RDMA, and Ethernet fabric tuning for low-latency HPC workloads.
    • Configure and fine-tune HPC schedulers (e.g., Slurm, OpenPBS, SGE) for optimal GPU workload distribution.
    • Implement containerization strategies (Podman, Docker) and orchestration platforms (K3s, Kubernetes) for managing distributed AI/ML workloads.
  • Networking and Infrastructure Support
    • Configure, monitor, and troubleshoot high-performance network fabrics, ensuring low-latency, high-throughput communication between GPU nodes.
    • Deploy and maintain InfiniBand, RoCE, and high-speed Ethernet for HPC and AI clusters.
    • Collaborate with networking teams to optimize routing, switching, and load balancing for distributed computing environments.
    • Work closely with Nvidia engineers and system architects to implement GPUDirect Storage, NVLink, and Magnum IO for accelerated workloads.
  • Security, Automation, and Monitoring
    • Maintain authentication and authorization systems such as Active Directory, OpenLDAP, and Keycloak.
    • Automate system provisioning and configuration using Ansible, Terraform, or other Infrastructure-as-Code tools.
    • Monitor system performance using Prometheus, Grafana, and ELK Stack, identifying and resolving bottlenecks in GPU workloads.
    • Implement security best practices for multi-tenant HPC clusters, ensuring compliance with industry standards.
  • Troubleshooting and Client Support
    • Serve as the lead technical resource for diagnosing and resolving complex software, networking, and hardware issues in large-scale GPU clusters.
    • Analyze logs, conduct performance profiling, and debug CUDA, MPI, and RDMA-related issues.
    • Work closely with AI/ML research teams, cloud engineers, and enterprise clients to optimize workload performance.
  • Collaboration and Process Improvement
    • Support the ongoing development of internal HPC test environments and customer POCs.
    • Work cross-functionally with Service Desk, Operations, and Service Delivery Management to ensure seamless service.
    • Provide technical documentation, training, and mentorship to junior team members.

HPC Infrastructure and Support Engineer in Reading employer: asobbi

As a rapidly growing cloud provider at the forefront of high-performance computing, we offer an innovative work environment that fosters collaboration and creativity. Our commitment to employee growth is evident through continuous training opportunities and mentorship, particularly in cutting-edge technologies like GPUaaS and AI infrastructure. Located in a vibrant tech hub, we provide our team with access to industry leaders and a culture that values diversity, inclusion, and the pursuit of excellence.

asobbi

Contact Details:

asobbi Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land HPC Infrastructure and Support Engineer in Reading

Join Engineering Meetups!

Get yourself along to local engineering meetups or tech conferences. These are great places to connect with like-minded folks and industry leaders who might just have a lead on that full-time HPC Infrastructure and Support Engineer role you’re after at asobbi.

Show Off Your Projects!

Don’t be shy about showcasing your engineering projects. Whether it’s a funky app, a mechanical design, or a complex algorithm, having a solid portfolio on platforms like GitHub can really make you stand out. Plus, it gives potential employers at asobbi a taste of what you can bring to the table!

Engage with Online Communities

Dive into engineering forums and online communities, like Reddit or specific engineering Discord channels. Sharing your insights, asking questions, and being active can help you build connections that might lead to job opportunities at asobbi.

Apply Through Company Websites

When you spot a role like HPC Infrastructure and Support Engineer at asobbi, apply directly through their website. Often, this can show your genuine interest in the company and you might just get noticed quicker than via typical job boards.

We think you need these skills to ace HPC Infrastructure and Support Engineer in Reading

HPC Cluster Management
Linux Operating Systems
Nvidia GPU Optimization
CUDA
NCCL
InfiniBand Configuration
RDMA

Some tips for your application 🫡

Showcase Your Technical Expertise:When applying for an engineering role like HPC Infrastructure and Support Engineer, it’s essential to highlight your technical skills. Include any relevant software or tools you're proficient in on your CV—think CAD software, simulation tools, or programming languages. Don't skimp on any engineering projects you've worked on that demonstrate your ability to solve complex problems.

Focus on Results and Impact:In the engineering world, we love numbers and real-world impact. Quantify your achievements wherever possible—like reducing costs by a certain percentage, improving efficiency, or successfully completing a project ahead of schedule. This gives your future employers at asobbi a clear picture of the value you can bring.

Craft a Compelling Cover Letter:Use your cover letter to express your passion for engineering and explain why you’re drawn to asobbi specifically. Share what aspects of their work excite you and how your values align with theirs. This is your chance to show a bit of personality while keeping it professional!

Include Relevant Certifications:If you have any engineering certifications, especially ones that are recognised in your field, make sure to feature them prominently on your CV. They demonstrate not just your knowledge, but also your commitment to professional development, which is something we at StudySmarter value highly.

How to prepare for a job interview at asobbi

Brush Up on Core Engineering Principles

Before heading into the interview with asobbi, make sure you're solid on the fundamental engineering principles relevant to the role. Be ready to discuss concepts such as thermodynamics, fluid mechanics, or structural analysis, depending on the specifics mentioned in the job description. Don’t skip any hands-on projects or coursework; these can be excellent talking points!

Show Off Your Problem-Solving Skills

Expect technical questions or case studies during your interview—after all, engineering is all about solving problems! Prepare a few examples of how you've tackled engineering challenges in the past, whether at university or in any practical experience. Practising with mock technical interviews can really help you articulate your thought process and solutions.

Relate Your Experience to the Role

In a full-time role, employers like asobbi want to see that you can adapt and grow within their team. Be ready to discuss how your previous internships, projects, or studies relate directly to the work you'll be doing. Highlight specific experiences that showcase your collaborative skills and how you've successfully worked within a team environment.

Know Your Tools and Software

Most engineering roles require familiarity with specific tools and software. Prepare to talk about your proficiency with programmes like AutoCAD, MATLAB, or SolidWorks if they’re relevant to the role. Even better, have examples of projects where you’ve used these tools, as it'll demonstrate your hands-on experience and readiness for the job.