Platform Engineer in London

Platform Engineer in London

London Full-Time 60000 - 80000 ÂŁ / year (est.) No home office possible
C

At a Glance

  • Tasks: Design and manage cutting-edge AI platforms while collaborating with top engineering teams.
  • Company: Join Era4, a mission-driven start-up transforming the UK's AI infrastructure sustainably.
  • Benefits: Enjoy competitive pay, flexible work options, and opportunities for personal growth.
  • Why this job: Make a real impact in a dynamic environment focused on innovation and sustainability.
  • Qualifications: Experience in HPC, AI platforms, and strong communication skills are essential.
  • Other info: Open to candidates eager to learn and grow in a diverse, inclusive workplace.

The predicted salary is between 60000 - 80000 ÂŁ per year.

Era4 develops, owns and operates AI infrastructure across the UK, powered by renewable energy. Converting legacy industrial and energy sites into modern data‐centre facilities, Era4 is combining brownfield regeneration opportunities with cleaner, efficient, scalable compute capacity for healthcare, research, finance, enterprise, and public‐sector organisations.

We are looking for a Platform Engineer (HPC & AI) who can assist in shaping our new Platform team. This role will be customer facing, involve technical troubleshooting, and collaboration with vendor engineering teams to ensure seamless AI platform operations.

Responsibilities:

  • Designing, deploying, and managing large‐scale HPC and GPU‐accelerated clusters, including NVIDIA based compute environments.
  • Implementing and administering HPC scheduling and resource‐management systems (e.g., Slurm), including GPU partitioning, workload scheduling, and capacity planning.
  • Architecting and optimising InfiniBand and Ethernet network topologies.
  • Ensuring high availability and resilience through failover strategies, planned maintenance coordination, and proactive risk mitigation.
  • Automating provisioning, configuration, monitoring, and operational workflows across multi‐vendor HPC hardware and software stacks.
  • Monitoring real‐time performance and leading troubleshooting efforts across compute, storage, interconnect, drivers, and node failures, engaging vendor support for critical issues.
  • Incident response: node failure management, network issues, driver issues, troubleshooting common issues and then working with vendor support to resolve any critical issues.
  • Security and access control: Manage user permissions, RBAC, security hardening, data protection.

Required Skills & Experience:

  • Experience supporting HPE PCAI or other AI/HPC infrastructure and platforms.
  • System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel.
  • Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes and container orchestration.
  • Understanding of automation, monitoring and security with GPU as a service.
  • Extensive experience in system engineering, platform operations or SRE.
  • Experience with GPU resource allocation (across instances, GPUs count and time).
  • Advanced networking skills with High performance networking, troubleshooting and fine tuning.
  • Familiarity with cloud‐based platforms, APIs, and distributed systems.
  • Understanding of AI/ML concepts and tooling (model training, inference, data pipelines basics).
  • Experience with monitoring/logging tools (e.g., Grafana, Kibana, Splunk).
  • Excellent communication skills to interface with both customers and internal / vendor teams.
  • Good understanding of tools requirements for ML engineers and data scientists, and how to optimise the experience.

Why Join Era4:

You’ll be joining a mission‐driven start‐up building critical national infrastructure, where operational excellence directly enables growth. This role offers high visibility with leadership, real autonomy, and the chance to shape how a next‐generation company operates at scale.

Diversity & Inclusion:

Era4 is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Note: We appreciate this is a relatively new skill set and we are open to candidates who may not tick all the boxes but are willing to learn and develop their skillset.

Platform Engineer in London employer: Carbon3ai Limited.

At Era4, we pride ourselves on being an exceptional employer, offering a dynamic work culture that fosters innovation and collaboration. As a Platform Engineer, you'll have the opportunity to work with cutting-edge AI infrastructure while contributing to meaningful projects that promote sustainability and efficiency. Our commitment to employee growth is evident through our supportive environment, where diverse talents are celebrated, and continuous learning is encouraged.
C

Contact Detail:

Carbon3ai Limited. Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Platform Engineer in London

✨Tip Number 1

Network like a pro! Reach out to people in the industry, attend meetups, and connect with potential colleagues on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects related to HPC, AI, or any relevant tech. This gives you a chance to demonstrate your expertise beyond just a CV.

✨Tip Number 3

Prepare for interviews by brushing up on common technical questions and scenarios related to platform engineering. Practice explaining your thought process clearly, as communication is key when working with customers and vendor teams.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our mission-driven team at Era4.

We think you need these skills to ace Platform Engineer in London

HPC and GPU-accelerated clusters management
NVIDIA compute environments
HPC scheduling and resource-management systems (e.g., Slurm)
InfiniBand and Ethernet network topologies architecture
High availability and resilience strategies
Automation of provisioning and configuration
Real-time performance monitoring
Incident response and troubleshooting
Security and access control management
System administration (RHEL/CentOS, Ubuntu)
Ansible, Nvidia and CUDA toolkits proficiency
Kubernetes and container orchestration
Cloud-based platforms and APIs familiarity
AI/ML concepts understanding
Monitoring/logging tools experience (e.g., Grafana, Kibana, Splunk)
Excellent communication skills

Some tips for your application 🫡

Tailor Your CV: Make sure your CV speaks directly to the role of Platform Engineer. Highlight your experience with HPC, AI, and any relevant technologies like Ansible or Kubernetes. We want to see how your skills align with what we're looking for!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about AI infrastructure and how you can contribute to our mission at Era4. Keep it engaging and personal – we love to see your personality come through.

Showcase Your Problem-Solving Skills: In your application, don’t just list your skills; share examples of how you've tackled challenges in previous roles. Whether it's troubleshooting or optimising systems, we want to know how you approach problems and find solutions.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way to ensure your application gets into the right hands. Plus, it shows us you’re serious about joining our team at Era4!

How to prepare for a job interview at Carbon3ai Limited.

✨Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description, like HPC, GPU environments, and tools like Ansible and Kubernetes. Brush up on your knowledge of RHEL/CentOS and Ubuntu, as well as any relevant AI/ML concepts. Being able to discuss these topics confidently will show that you're serious about the role.

✨Prepare for Technical Troubleshooting Scenarios

Since the role involves technical troubleshooting, be ready to tackle hypothetical scenarios during the interview. Think about common issues related to node failures or network problems and how you would approach resolving them. This will demonstrate your problem-solving skills and technical expertise.

✨Showcase Your Communication Skills

As this position is customer-facing, it’s crucial to highlight your communication abilities. Prepare examples of how you've effectively collaborated with teams or communicated complex technical information to non-technical stakeholders. This will help illustrate your fit for a role that requires both technical and interpersonal skills.

✨Express Your Willingness to Learn

Era4 values candidates who are eager to learn and grow. If you don’t tick every box, don’t sweat it! Be honest about your current skill set and express your enthusiasm for developing new skills. This openness can make a positive impression and show that you’re a good cultural fit for the team.

Platform Engineer in London
Carbon3ai Limited.
Location: London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>