Senior SRE Engineer in Birmingham

Senior SRE Engineer in Birmingham

Birmingham Full-Time No working from home possible
OneAdvanced Limited

Role Introduction

We are looking for a Senior SRE Engineer with deep expertise in Kubernetes, Amazon EKS, and cloud platform engineering to lead the evolution of our core engineering platform. This role is responsible for ensuring the scalability, reliability, security, and operational excellence of the EKS infrastructure that underpins our CI/CD ecosystem, including the Harness delegate platform used by development teams to build and deploy software.

The successful candidate will drive platform automation, infrastructure standardisation, and toolchain optimisation, reducing operational overhead while enabling teams to deliver software faster and more reliably. As a technical leader, they will establish best practices for platform governance, observability, resilience, and developer experience across the organisation. This is a pivotal role in maintaining and enhancing the platform that supports how the business ships code today and scales for future growth.

What You Will Do

Cloud Platform & Infrastructure Engineering

  • Design, build, and operate scalable, secure, and highly available cloud infrastructure across AWS, Azure and GCP
  • Own and manage core platform services including networking, IAM, compute, storage, DNS, load balancing, and private connectivity.
  • Implement cloud governance, operational standards, security controls, and cost optimisation practices across multi-cloud environments.
  • Support enterprise-scale platform reliability, availability, and scalability objectives.

Kubernetes & Amazon EKS Platform Ownership

  • Own and operate production Amazon EKS clusters that underpin the organisation's CI/CD and software delivery platforms.
  • Manage Kubernetes lifecycle activities including upgrades, patching, scaling, capacity management, monitoring, and troubleshooting.
  • Administer cluster resources, node groups, Karpenter, autoscaling, ingress controllers, namespaces, RBAC, networking, storage, and secrets management.
  • Drive platform scalability, resilience, and operational excellence for containerised workloads and microservices.
  • Troubleshoot complex Kubernetes issues including scheduling, networking, DNS, performance, and cluster reliability.

Platform Automation & Infrastructure as Code

  • Develop, maintain, and standardise reusable Infrastructure as Code (IaC) components using Terraform and other automation frameworks.
  • Build self-service platform capabilities that simplify infrastructure provisioning and operational workflows for engineering teams.
  • Automate infrastructure lifecycle management, cloud operations, and platform maintenance activities.
  • Promote infrastructure standardisation and governance through reusable modules, templates, and platform engineering practices.

CI/CD, Toolchain & Developer Platform Engineering

  • Build, support, and optimise enterprise CI/CD platforms including Harness, GitHub Actions, Jenkins, and related toolchains.
  • Maintain the EKS infrastructure supporting Harness delegates and deployment services used across the organisation.
  • Improve deployment reliability, software delivery velocity, and operational efficiency through automation and platform enhancements.
  • Support Internal Developer Platform (IDP) initiatives, including Backstage, to provide self-service capabilities for development teams.
  • Drive toolchain standardisation and adoption of DevOps best practices across engineering teams.

Observability, Reliability & Operational Excellence

  • Implement and maintain observability platforms using Grafana, Prometheus, CloudWatch, and centralised logging solutions.
  • Develop dashboards, metrics, alerting strategies, and service health monitoring for cloud infrastructure, Kubernetes platforms, and application services.
  • Lead incident response, root cause analysis, and continuous reliability improvements.
  • Support production environments and critical platform services, ensuring high availability and operational resilience.

Security, Governance & Collaboration

  • Implement DevSecOps practices including IAM, secrets management, vulnerability remediation, encryption, and infrastructure hardening.
  • Ensure compliance with organisational security standards and cloud governance policies.
  • Collaborate closely with engineering, operations, security, and product teams to deliver scalable and secure platform capabilities.
  • Maintain technical documentation, runbooks, architecture diagrams, and operational procedures.

Leadership & Continuous Improvement

  • Mentor engineers and promote platform engineering, SRE, and DevOps best practices across the organisation.
  • Drive continuous improvement initiatives focused on automation, developer experience, reliability, scalability, and operational efficiency.
  • Evaluate emerging technologies and platform capabilities to enhance the engineering ecosystem.

What You Will Have

  • 4-5 years+ of experience in Site Reliability Engineering (SRE), DevOps, Platform Engineering, or Infrastructure Engineering roles.
  • Strong hands-on experience with AWS and Azure and GCP cloud platforms
  • Deep expertise in Amazon EKS and Kubernetes administration, architecture, troubleshooting, upgrades, and operational management.
  • Strong understanding of Kubernetes internals, container orchestration, Karpenter, Helm, and cloud-native platform operations.
  • Experience building and operating enterprise-scale CI/CD platforms using Harness, GitHub Actions, Jenkins, or similar technologies.
  • Proven experience supporting deployment platforms and toolchains that enable software delivery across engineering organisations.
  • Advanced Infrastructure as Code (IaC) experience using Terraform, CloudFormation, and infrastructure automation frameworks.
  • Experience with configuration management and automation tools such as Ansible.
  • Strong scripting and automation skills using Bash, Python, or Go.
  • Hands-on experience with observability and monitoring platforms including Grafana, Grafana Cloud, Prometheus, CloudWatch, ELK/OpenSearch, or similar technologies.
  • Experience implementing cloud security controls, IAM, secrets management, infrastructure hardening, and DevSecOps practices.
  • Experience with GitOps methodologies.
  • Experience with Internal Developer Platforms (IDP), Backstage, and platform engineering concepts is preferred.
  • Strong Linux administration, troubleshooting, and production support experience.
  • Excellent communication, stakeholder management, and cross-functional collaboration skills.
  • Demonstrated automation-first mindset with a strong focus on scalability, reliability, and operational excellence.
  • AWS, Azure, and Certified Kubernetes Administrator (CKA) certifications are highly desirable.

What We Do For You

  • Wellbeingfocused–Ourpeopleareourgreatestassets,andensuringeveryonefeelstheirbestself tocometoworkisintegral.
  • AnnualLeave–20daysofannualleave,pluspublicholidays
  • EmployeeAssistanceProgramme–Freeadvice,support,andconfidentialcounsellingavailable24/7.
  • Personal Growth-
    • Development Programmes– From Future Managers to Leadership Training, our development programmes help you get where you need to go
    • Online Learning Platform: SkillsHub! - Learning at your fingertips, anytime from anywhere. You can access our online library with relevant content for your career growth.
  • Life Insurance - 3x annual salary
  • Personal Accident Insurance - providing cover in the event of serious injury/illness.
  • Performance Bonus– Our Group-wide bonus scheme enables you to reap the rewards of your success
#J-18808-Ljbffr
OneAdvanced Limited

Contact Details:

OneAdvanced Limited Recruitment Team