Job Board

Companies

Carbon3ai Limited.

AI Platform Engineer - HPC

AI Platform Engineer - HPC in London

London Full-Time 36000 - 60000 £ / year (est.) No home office possible

At a Glance

Tasks: Provide technical support for an innovative AI platform and troubleshoot issues.
Company: Join a pioneering tech company focused on renewable energy and AI.
Benefits: Competitive salary, flexible working options, and opportunities for professional growth.
Why this job: Be part of the future of AI and make a real difference in technology.
Qualifications: Experience in technical support and a passion for AI and cloud technologies.
Other info: Dynamic work environment with excellent career advancement opportunities.

The predicted salary is between 36000 - 60000 £ per year.

We are building the UK's next generation AI platform, powered by renewable energy, rooted in sovereign capability, and designed to give enterprises and innovators the compute they need. We need a Support Engineer / Cluster Administrator to provide Level 1 and Level 2 support for the AI platform. This role will be customer facing, involve technical troubleshooting, and collaboration with vendor engineering teams to ensure seamless AI platform operations.

Key Responsibilities

L1 support for customer-reported issues and requests
L2 support by diagnosing, replicating, and troubleshooting issues across platform and infrastructure.
Coordinate resolution of complex issues (L3) to (vendor) product/engineering teams and manage vendor responses
Monitor system health, alerts, and customer usage patterns
Document solutions/workarounds, create and maintain knowledge, document support procedures
Automate common tasks and fixes
Configure and integrate tooling to support optimal operation of the platform, and support tool selection
Assist customers with platform configuration, onboarding, and usage best practices
Collaborate with platform and infrastructure support/engineering teams to resolve platform integration issues
Ensure SLAs and customer satisfaction targets are met
Work with customers and multiple stakeholders to understand requirements and challenges, provide reporting on usage, workflow and billing

Technical responsibilities

Cluster Infrastructure management: Managing the Nvidia GPU cluster.
High availability and resilience: Implement failover strategies and manage maintenance events to minimise downtime.
Resource allocation and optimisation: Resource partitioning (GPU resources), workload scheduling, capacity planning.
Performance monitoring and troubleshooting: Performance analysis, monitoring (realtime) with available Nvidia and HPE tools.
Incident response: node failure management, network issues, driver issues, troubleshooting common issues and then working with vendor support to resolve any critical issues.
Security and access control: Manage user permissions, RBAC, security hardening, data protection.

Required Skills & Experience

Extensive experience in technical support, system engineering, or platform operations.
Solid understanding of L1 and L2 support processes (ticketing, escalation, troubleshooting).
Familiarity with cloud-based platforms, APIs, and distributed systems.
Understanding of AI/ML concepts and tooling (model training, inference, data pipelines basics).
Experience with monitoring/logging tools (e.g., Grafana, Kibana, Splunk).
Excellent communication skills to interface with both customers and internal / vendor teams.
Good understanding of tools requirements for ML engineers and data scientists, and how to optimise the experience.

Core Technical skills

System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel.
Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes and container orchestration.
Understanding of automation, monitoring and security with GPU as a service.

Preferred experience

Experience supporting HPE PCAI or other AI/HPC infrastructure and platforms.
Experience with GPU resource allocation (across instances, GPUs count and time).
Advanced networking skills with High performance networking, troubleshooting and fine tuning.
Background in DevOps or SRE practices.

Success Metrics

Customers receive timely, effective support with minimal escalations.
Issues are resolved or routed correctly with high-quality documentation.
The platform maintains strong uptime and customer satisfaction.

AI Platform Engineer - HPC in London employer: Carbon3ai Limited.

Join us in shaping the future of AI technology at our cutting-edge facility, where we prioritise a collaborative and innovative work culture. As an AI Platform Engineer, you'll benefit from extensive professional development opportunities, a commitment to sustainability through renewable energy, and the chance to work with leading experts in the field. Our supportive environment fosters growth and ensures that you play a vital role in delivering exceptional service to our customers while enjoying a fulfilling career.

Contact Detail:

Carbon3ai Limited. Recruiting Team

View Carbon3ai Limited. Profile

StudySmarter Expert Advice 🤫

We think this is how you could land AI Platform Engineer - HPC in London

✨Tip Number 1

Get your networking game on! Connect with folks in the AI and HPC space on LinkedIn or at local meetups. Building relationships can lead to insider info about job openings and even referrals.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repo showcasing any projects related to AI, HPC, or system administration. This gives potential employers a taste of what you can do beyond just a CV.

✨Tip Number 3

Practice makes perfect! Prepare for technical interviews by brushing up on troubleshooting scenarios and common issues in AI platforms. Mock interviews with friends can help you nail those tricky questions.

✨Tip Number 4

Don’t forget to apply through our website! We’re always on the lookout for passionate individuals who want to be part of building the next-gen AI platform. Your dream job could be just a click away!

We think you need these skills to ace AI Platform Engineer - HPC in London

Technical Support

System Engineering

Platform Operations

L1 and L2 Support Processes

Cloud-Based Platforms

APIs

Distributed Systems

AI/ML Concepts

Monitoring Tools (e.g., Grafana, Kibana, Splunk)

System Administration (RHEL/CentOS, Ubuntu)

Linux Kernel Tuning

Ansible

Nvidia and CUDA Toolkits

Kubernetes

High Performance Networking

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the AI Platform Engineer role. Highlight your experience with technical support, system engineering, and any relevant tools you've used. We want to see how your skills match what we're looking for!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about AI and how your background makes you a great fit for our team. Don't forget to mention your experience with customer-facing roles and troubleshooting.

Show Off Your Technical Skills: In your application, be sure to showcase your technical skills, especially with systems like RHEL/CentOS or Ubuntu, and tools like Ansible and Kubernetes. We love seeing candidates who can demonstrate their hands-on experience!

Apply Through Our Website: We encourage you to apply through our website for the best chance of getting noticed. It helps us keep track of applications and ensures you’re considered for the role. Plus, it’s super easy!

How to prepare for a job interview at Carbon3ai Limited.

✨Know Your Tech Inside Out

Make sure you brush up on your technical skills related to AI platforms, especially around L1 and L2 support processes. Familiarise yourself with the Nvidia GPU cluster management and tools like Ansible and Kubernetes. Being able to discuss these confidently will show that you're ready for the role.

✨Show Off Your Troubleshooting Skills

Prepare to share specific examples of how you've diagnosed and resolved technical issues in the past. Think about times when you had to collaborate with vendor teams or manage complex incidents. This will demonstrate your problem-solving abilities and your experience in a customer-facing role.

✨Communicate Clearly and Effectively

Since this role involves interfacing with customers and internal teams, practice explaining technical concepts in simple terms. Good communication is key, so be ready to showcase your ability to convey information clearly and concisely during the interview.

✨Understand Customer Needs

Research the company’s AI platform and think about how you can contribute to enhancing customer satisfaction. Be prepared to discuss how you would approach onboarding customers and optimising their experience with the platform. Showing that you understand their needs will set you apart.

AI Platform Engineer - HPC in London

Carbon3ai Limited.

Location: London

AI Platform Engineer - HPC in London

London

Full-Time

36000 - 60000 £ / year (est.)
Carbon3ai Limited.

50-100

View Carbon3ai Limited. Profile

Similar positions in other companies

UK’s top job board for Gen Z

Discover now

AI Platform Engineer - HPC in London

At a Glance

AI Platform Engineer - HPC in London employer: Carbon3ai Limited.

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace AI Platform Engineer - HPC in London

Some tips for your application 🫡

How to prepare for a job interview at Carbon3ai Limited.

AI Platform Engineer - HPC in London

Land your dream job quicker with Premium

Similar positions in other companies

UK’s top job board for Gen Z