Systems Engineer - (HPC & AI)

Systems Engineer - (HPC & AI)

Full-Time 36000 - 60000 £ / year (est.) No home office possible
Go Premium
C

At a Glance

  • Tasks: Join a dynamic team to manage cutting-edge AI infrastructure and ensure seamless platform operations.
  • Company: Carbon3.ai, an innovative start-up revolutionising AI solutions with renewable energy.
  • Benefits: Competitive salary, flexible working options, and opportunities for professional growth.
  • Why this job: Be at the forefront of AI technology and make a real impact in a fast-growing industry.
  • Qualifications: Experience in technical support or system engineering, with a passion for AI and cloud technologies.
  • Other info: Join a collaborative environment with excellent career advancement opportunities.

The predicted salary is between 36000 - 60000 £ per year.

Join to apply for the Lead Systems Engineer - HPC / AI role at Carbon3.ai - Building the UK's AI Solution Platform. We are an emerging AI infrastructure start-up building next-generation data centres and high-performance compute environments to power AI, LLM training and cloud-scale workloads, powered by renewable energy, rooted in sovereign capability, and designed to give enterprises and innovators the compute they need. Backed by leading investors, we are rapidly expanding our site development pipeline, engineering capabilities, and commercial partnerships.

We are looking for a Lead Systems Engineer who can assist in shaping the new Platform team. This role will be customer facing, involve technical troubleshooting, and collaboration with vendor engineering teams to ensure seamless AI platform operations.

Key Responsibilities
  • Coordinate resolution of complex issues (L3) to (vendor) product/engineering teams and manage vendor responses.
  • Monitor system health, alerts, and customer usage patterns.
  • Document solutions/workarounds, create and maintain knowledge, document support procedures.
  • Automate common tasks and fixes.
  • Configure and integrate tooling to support optimal operation of the platform, and support tool selection.
  • Assist customers with platform configuration, onboarding, and usage best practices.
  • Collaborate with platform and infrastructure support/engineering teams to resolve platform integration issues.
  • Ensure SLAs and customer satisfaction targets are met.
  • L1 support for customer-reported issues and requests.
  • L2 support by diagnosing, replicating, and troubleshooting issues across platform and infrastructure.
  • Work with customers and multiple stakeholders to understand requirements and challenges, provide reporting on usage, workflow and billing.
Technical Responsibilities
  • Cluster Infrastructure management: Managing the Nvidia GPU cluster.
  • High availability and resilience: Implement failover strategies and manage maintenance events to minimise downtime.
  • Resource allocation and optimisation: Resource partitioning (GPU resources), workload scheduling, capacity planning.
  • Performance monitoring and troubleshooting: Performance analysis, monitoring (realtime) with available Nvidia and HPE tools.
  • Incident response: node failure management, network issues, driver issues, troubleshooting common issues and then working with vendor support to resolve any critical issues.
  • Security and access control: Manage user permissions, RBAC, security hardening, data protection.
Required Skills & Experience
  • Extensive experience in technical support, system engineering, or platform operations.
  • Solid understanding of L1 and L2 support processes (ticketing, escalation, troubleshooting).
  • Familiarity with cloud-based platforms, APIs, and distributed systems.
  • Understanding of AI/ML concepts and tooling (model training, inference, data pipelines basics).
  • Experience with monitoring/logging tools (e.g., Grafana, Kibana, Splunk).
  • Excellent communication skills to interface with both customers and internal/vendor teams.
  • Good understanding of tools requirements for ML engineers and data scientists, and how to optimize the experience.
Core Technical Skills
  • System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel.
  • Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes and container orchestration.
  • Understanding of automation, monitoring and security with GPU as a service.
Preferred Experience
  • Experience supporting HPE PCAI or other AI/HPC infrastructure and platforms.
  • Experience with GPU resource allocation (across instances, GPUs count and time).
  • Advanced networking skills with High performance networking, troubleshooting and fine tuning.
  • Background in DevOps or SRE practices.
  • ITIL familiarity.
Success Metrics
  • Customers receive timely, effective support with minimal escalations.
  • Issues are resolved or routed correctly with high-quality documentation.
  • The platform maintains strong uptime and customer satisfaction.

Systems Engineer - (HPC & AI) employer: Carbon3.ai - Building the UK's AI Solution Platform

At Carbon3.ai, we pride ourselves on being an innovative employer that fosters a collaborative and dynamic work environment. As a Lead Systems Engineer, you will have the opportunity to work at the forefront of AI technology in a rapidly growing start-up, with access to cutting-edge resources and a commitment to employee development. Our culture emphasises sustainability and teamwork, ensuring that you not only contribute to groundbreaking projects but also grow alongside a passionate team dedicated to shaping the future of AI infrastructure.
C

Contact Detail:

Carbon3.ai - Building the UK's AI Solution Platform Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Systems Engineer - (HPC & AI)

✨Tip Number 1

Get your networking game on! Reach out to people in the industry, especially those already working at Carbon3.ai or similar companies. A friendly chat can open doors and give you insider info that could help you stand out.

✨Tip Number 2

Show off your skills! Prepare a portfolio or a project that highlights your experience with HPC, AI, or any relevant tech. When you get the chance to chat with hiring managers, having something tangible to discuss can really set you apart.

✨Tip Number 3

Practice makes perfect! Get ready for technical interviews by brushing up on your troubleshooting skills and understanding of L1 and L2 support processes. Mock interviews with friends or mentors can help you feel more confident.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows you’re genuinely interested in joining the team at Carbon3.ai.

We think you need these skills to ace Systems Engineer - (HPC & AI)

Technical Support
System Engineering
Platform Operations
L1 and L2 Support Processes
Cloud-based Platforms
APIs
Distributed Systems
AI/ML Concepts
Monitoring Tools (e.g., Grafana, Kibana, Splunk)
Communication Skills
System Administration (RHEL/CentOS, Ubuntu)
Ansible
Nvidia and CUDA Toolkits
Kubernetes
Automation and Security with GPU as a Service

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the Systems Engineer role. Highlight your experience with HPC, AI, and any relevant technical skills that match the job description. We want to see how your background aligns with what we're looking for!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about AI infrastructure and how you can contribute to our team. Keep it concise but impactful – we love a good story!

Show Off Your Technical Skills: Don’t hold back on showcasing your technical expertise! Mention specific tools and technologies you've worked with, especially those related to GPU management, cloud platforms, and automation. We’re keen to see your hands-on experience!

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy – just a few clicks and you’re done!

How to prepare for a job interview at Carbon3.ai - Building the UK's AI Solution Platform

✨Know Your Tech Inside Out

Make sure you brush up on your technical skills, especially around system administration and the tools mentioned in the job description. Be ready to discuss your experience with Nvidia GPU clusters, Ansible, and Kubernetes, as well as any troubleshooting you've done in high-performance computing environments.

✨Showcase Your Problem-Solving Skills

Prepare examples of how you've resolved complex issues in the past, particularly in a customer-facing role. Think about specific incidents where you coordinated with vendor teams or automated tasks to improve efficiency. This will demonstrate your ability to handle the responsibilities outlined in the role.

✨Communicate Clearly and Confidently

Since this role involves liaising with customers and internal teams, practice articulating your thoughts clearly. Use the STAR method (Situation, Task, Action, Result) to structure your responses, ensuring you convey your communication skills effectively during the interview.

✨Understand the Company’s Vision

Familiarise yourself with Carbon3.ai's mission and values, especially their focus on renewable energy and AI solutions. Being able to align your personal goals with the company's vision will show your genuine interest in the role and help you stand out as a candidate.

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

C
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>