At a Glance
- Tasks: Design and manage cutting-edge AI platforms while collaborating with top engineering teams.
- Company: Join Era4, a mission-driven start-up transforming energy sites into modern data centres.
- Benefits: Enjoy competitive pay, flexible work options, and opportunities for personal growth.
- Other info: Diverse and inclusive workplace with a focus on learning and development.
- Why this job: Make a real impact in AI infrastructure and help shape the future of technology.
- Qualifications: Experience in HPC, AI platforms, and strong communication skills are essential.
The predicted salary is between 60000 - 80000 ÂŁ per year.
Era4 develops, owns and operates AI infrastructure across the UK, powered by renewable energy. Converting legacy industrial and energy sites into modern data‑centre facilities, Era4 is combining brownfield regeneration opportunities with cleaner, efficient scalable compute capacity for healthcare, research, finance, enterprise, and public‑sector organisations.
Role Summary
We are looking for a Platform Engineer (HPC & AI) who can assist in shaping our new Platform team. This role will be customer facing, involve technical troubleshooting, and collaboration with vendor engineering teams to ensure seamless AI platform operations.
Responsibilities
- Designing, deploying, and managing large‑scale HPC and GPU‑accelerated clusters, including NVIDIA based compute environments.
- Implementing and administering HPC scheduling and resource‑management systems (e.g., Slurm), including GPU partitioning, workload scheduling, and capacity planning.
- Architecting and optimising InfiniBand and Ethernet network topologies.
- Ensuring high availability and resilience through failover strategies, planned maintenance coordination, and proactive risk mitigation.
- Automating provisioning, configuration, monitoring, and operational workflows across multi‑vendor HPC hardware and software stacks.
- Monitoring real‑time performance and leading troubleshooting efforts across compute, storage, interconnect, drivers, and node failures, engaging vendor support for critical issues.
- Incident response: node failure management, network issues, driver issues, troubleshooting common issues and then working with vendor support to resolve any critical issues.
- Security and access control: Manage user permissions, RBAC, security hardening, data protection.
Required Skills & Experience
- Experience supporting HPE PCAI or other AI/HPC infrastructure and platforms.
- System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel.
- Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes and container orchestration.
- Understanding of automation, monitoring and security with GPU as a service.
- Extensive experience in system engineering, platform operations or SRE.
- Experience with GPU resource allocation (across instances, GPUs count and time).
- Advanced networking skills with High performance networking, troubleshooting and fine tuning.
- Familiarity with cloud-based platforms, APIs, and distributed systems.
- Understanding of AI/ML concepts and tooling (model training, inference, data pipelines basics).
- Experience with monitoring/logging tools (e.g., Grafana, Kibana, Splunk).
- Excellent communication skills to interface with both customers and internal / vendor teams.
- Good understanding of tools requirements for ML engineers and data scientists, and how to optimise the experience.
Why Join Era4
You’ll be joining a mission‑driven start‑up building critical national infrastructure, where operational excellence directly enables growth. This role offers high visibility with leadership, real autonomy, and the chance to shape how a next‑generation company operates at scale.
Diversity & Inclusion
Era4 is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Note
We appreciate this is a relatively new skill set and we are open to candidates who may not tick all the boxes but are willing to learn and develop their skillset.
Platform Engineer employer: Era4
Contact Detail:
Era4 Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Platform Engineer
✨Network Like a Pro
Get out there and connect with folks in the industry! Attend meetups, webinars, or even online forums related to AI and HPC. You never know who might have a lead on your dream job at Era4!
✨Show Off Your Skills
When you get the chance to chat with potential employers, make sure to highlight your hands-on experience with HPC and AI platforms. Share specific examples of how you've tackled challenges in previous roles – it’ll make you stand out!
✨Ask Smart Questions
During interviews, don’t just wait for questions to be thrown at you. Prepare some insightful questions about Era4’s projects and future plans. This shows you're genuinely interested and helps you assess if it's the right fit for you.
✨Apply Through Our Website
We encourage you to apply directly through our website. It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re proactive and really keen on joining our team!
We think you need these skills to ace Platform Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV reflects the skills and experiences that match the Platform Engineer role. Highlight your experience with HPC, AI, and any relevant technologies like Ansible or Kubernetes. We want to see how you can contribute to our mission!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about working with AI infrastructure and how your background aligns with our goals at Era4. Let us know what excites you about this opportunity!
Showcase Your Problem-Solving Skills: In your application, share examples of how you've tackled technical challenges in the past. We love candidates who can demonstrate their troubleshooting abilities and innovative thinking, especially in high-pressure situations.
Apply Through Our Website: We encourage you to apply directly through our website for the best chance of getting noticed. It’s the easiest way for us to keep track of your application and ensure it reaches the right people. Don’t miss out!
How to prepare for a job interview at Era4
✨Know Your Tech Inside Out
Make sure you brush up on your knowledge of HPC and AI infrastructure, especially around NVIDIA-based compute environments. Be ready to discuss your experience with tools like Ansible, Kubernetes, and any relevant monitoring/logging tools. The more you can demonstrate your technical expertise, the better!
✨Showcase Your Problem-Solving Skills
Since this role involves troubleshooting and incident response, prepare some examples of how you've tackled technical challenges in the past. Think about specific instances where you managed node failures or network issues, and be ready to explain your thought process and the outcomes.
✨Communicate Clearly and Confidently
Excellent communication skills are key for this customer-facing role. Practice explaining complex technical concepts in simple terms, as you'll need to interface with both customers and vendor teams. A clear communicator stands out, so don’t shy away from showcasing your interpersonal skills.
✨Demonstrate Your Willingness to Learn
Era4 values candidates who may not tick all the boxes but are eager to learn. Be honest about areas where you might lack experience, but also highlight your enthusiasm for developing those skills. Showing a growth mindset can really set you apart from other candidates.