At a Glance
- Tasks: Join us to shape the future of AI infrastructure and solve complex technical challenges.
- Company: Carbon3.ai, an innovative start-up building the UK's AI Solution Platform.
- Benefits: Competitive salary, flexible working, and opportunities for professional growth.
- Why this job: Be at the forefront of AI technology and make a real impact in a dynamic environment.
- Qualifications: Experience in systems engineering and a passion for AI and cloud technologies.
- Other info: Join a rapidly expanding team with excellent career advancement opportunities.
The predicted salary is between 36000 - 60000 £ per year.
Join to apply for the Lead Systems Engineer - HPC / AI role at Carbon3.ai - Building the UK's AI Solution Platform. We are an emerging AI infrastructure start-up building next-generation data centres and high-performance compute environments to power AI, LLM training and cloud-scale workloads, powered by renewable energy, rooted in sovereign capability, and designed to give enterprises and innovators the compute they need. Backed by leading investors, we are rapidly expanding our site development pipeline, engineering capabilities, and commercial partnerships.
We are looking for a Lead Systems Engineer who can assist in shaping the new Platform team. This role will be customer facing, involve technical troubleshooting, and collaboration with vendor engineering teams to ensure seamless AI platform operations.
Key Responsibilities- Coordinate resolution of complex issues (L3) to (vendor) product/engineering teams and manage vendor responses.
- Monitor system health, alerts, and customer usage patterns.
- Document solutions/workarounds, create and maintain knowledge, document support procedures.
- Automate common tasks and fixes.
- Configure and integrate tooling to support optimal operation of the platform, and support tool selection.
- Assist customers with platform configuration, onboarding, and usage best practices.
- Collaborate with platform and infrastructure support/engineering teams to resolve platform integration issues.
- Ensure SLAs and customer satisfaction targets are met.
- L1 support for customer-reported issues and requests.
- L2 support by diagnosing, replicating, and troubleshooting issues across platform and infrastructure.
- Work with customers and multiple stakeholders to understand requirements and challenges, provide reporting on usage, workflow and billing.
- Cluster Infrastructure management: Managing the Nvidia GPU cluster.
- High availability and resilience: Implement failover strategies and manage maintenance events to minimise downtime.
- Resource allocation and optimisation: Resource partitioning (GPU resources), workload scheduling, capacity planning.
- Performance monitoring and troubleshooting: Performance analysis, monitoring (realtime) with available Nvidia and HPE tools.
- Incident response: node failure management, network issues, driver issues, troubleshooting common issues and then working with vendor support to resolve any critical issues.
- Security and access control: Manage user permissions, RBAC, security hardening, data protection.
- Extensive experience in technical support, system engineering, or platform operations.
- Solid understanding of L1 and L2 support processes (ticketing, escalation, troubleshooting).
- Familiarity with cloud-based platforms, APIs, and distributed systems.
- Understanding of AI/ML concepts and tooling (model training, inference, data pipelines basics).
- Experience with monitoring/logging tools (e.g., Grafana, Kibana, Splunk).
- Excellent communication skills to interface with both customers and internal/vendor teams.
- Good understanding of tools requirements for ML engineers and data scientists, and how to optimize the experience.
- System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel.
- Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes and container orchestration.
- Understanding of automation, monitoring and security with GPU as a service.
- Experience supporting HPE PCAI or other AI/HPC infrastructure and platforms.
- Experience with GPU resource allocation (across instances, GPUs count and time).
- Advanced networking skills with High performance networking, troubleshooting and fine tuning.
- Background in DevOps or SRE practices.
- ITIL familiarity.
- Customers receive timely, effective support with minimal escalations.
- Issues are resolved or routed correctly with high-quality documentation.
- The platform maintains strong uptime and customer satisfaction.
Systems Engineer - (HPC & AI) in England employer: Carbon3.ai - Building the UK's AI Solution Platform
Contact Detail:
Carbon3.ai - Building the UK's AI Solution Platform Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Systems Engineer - (HPC & AI) in England
✨Tip Number 1
Network, network, network! Get out there and connect with folks in the HPC and AI space. Attend meetups, webinars, or industry events. You never know who might have a lead on your dream job!
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects related to systems engineering, AI, or HPC. This gives potential employers a taste of what you can do and sets you apart from the crowd.
✨Tip Number 3
Don’t just apply blindly! Tailor your approach for each role. Research Carbon3.ai and understand their mission. When you reach out, mention how your experience aligns with their goals. It shows you’re genuinely interested.
✨Tip Number 4
Apply through our website! We want to see your application directly. It’s a great way to ensure it gets into the right hands. Plus, you’ll be one step closer to joining an exciting team at the forefront of AI solutions!
We think you need these skills to ace Systems Engineer - (HPC & AI) in England
Some tips for your application 🫡
Tailor Your CV: Make sure your CV is tailored to the Systems Engineer role. Highlight your experience with HPC, AI, and any relevant technical skills that match the job description. We want to see how your background aligns with what we're looking for!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about AI infrastructure and how your skills can contribute to our mission at Carbon3.ai. Keep it engaging and personal – we love to see your personality!
Showcase Your Problem-Solving Skills: In your application, don’t forget to mention specific examples of how you've tackled complex issues in the past. We’re looking for someone who can coordinate resolutions and manage vendor responses effectively, so let us know how you’ve done this before!
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on joining our team at Carbon3.ai!
How to prepare for a job interview at Carbon3.ai - Building the UK's AI Solution Platform
✨Know Your Tech Inside Out
Make sure you brush up on your technical skills, especially around systems engineering and AI concepts. Be ready to discuss your experience with Nvidia GPU clusters, L1/L2 support processes, and any relevant monitoring tools like Grafana or Splunk.
✨Showcase Your Problem-Solving Skills
Prepare to share specific examples of how you've tackled complex issues in the past. Think about times when you coordinated with vendor teams or automated tasks to improve efficiency. This will demonstrate your ability to handle the technical troubleshooting required for the role.
✨Communicate Clearly and Confidently
Since this role is customer-facing, practice articulating your thoughts clearly. Be ready to explain technical concepts in a way that non-technical stakeholders can understand. Good communication can set you apart from other candidates.
✨Understand the Company’s Vision
Familiarise yourself with Carbon3.ai's mission and values, especially their focus on renewable energy and sovereign capability. Showing that you align with their goals and understand their platform will make a strong impression during the interview.