SRE Engineer in London

SRE Engineer in London

London Full-Time No working from home possible
GBST Holdings Ltd

Permanent full time opportunity based in London.

Responsibilities

  • Ability to work on multiple tasks in parallel
  • Problem solver
  • Excellent communicator
  • Desire to improve things

Skills

  • Kubernetes
    • Kubernetes and application troubleshooting
    • Application deployment GitOps / ArgoCD
    • K8s and application logging (Loki / fluent bit)
    • Service Mesh (Linkerd preferred)
    • Ingress Config / Troubleshooting (AWS LB Controller / Nginx)
    • Autoscaling configuration (Karpenter)
    • Certificate management (cert-manager)
  • AWS services
    • EKS
    • RDS, DMS, RDS Proxy
    • AWS Backup
    • API Gateway
    • RabbitMQ
    • AWS Transfer Family (SFTP / SFTP Connector)
    • AWS NGFW, TGW, PrivateLink
    • AppStream
    • Lambda – Python
    • IAM
    • Kinesis
    • DynamoDB
  • Terragrunt / Terraform
    • Troubleshooting defects
  • GitOps
    • Helm / ArgoCD
  • Observability Tooling
    • Grafana, Prometheus, Loki, Cloudwatch configuration/dashboard creation
  • CI/CD
    • Git / Code Deploy / Code Pipeline

What U will do

  • Platform Operations:
    • Managing and optimising our infrastructure to ensure high availability and system reliability.
    • Deliver 24/7 support via on call rotation for after hour issues
  • Infrastructure Automation Expertise:
    • Experience with the AWS cloud platform including designing, deploying, and maintaining scalable infrastructure.

U will be someone with

  • Strong knowledge of container orchestration tools like Kubernetes and Docker.
  • Familiarity with deploying infrastructure as Code (IaC) with Terraform and CloudFormation.
  • Chaos Engineering Proficiency:
    • Understanding of implementing resilience testing strategies
    • Designing and implementing chaos engineering tools like AWS Fault Injection, Gremlin, Chaos Monkey, or LitmusChaos to design and execute fault injection experiments.
    • Knowledge of modern chaos engineering trends, such as adaptive resilience testing or AI-driven fault detection.
  • Monitoring and Observability:
    • Experience with monitoring and observability tools (e.g., Prometheus, ADOT, Grafana, Datadog, New Relic, Elastic Stack).
    • Strong understanding of instrumenting infrastructure with metrics, logging, and tracing.
  • Automation and Scripting:
    • Proficiency in scripting and automation languages (e.g., Python, Go, Shell, Ruby, or Java).
    • Demonstrated ability to automate infrastructure and operational processes.
  • Incident Management and Root Cause Analysis:
    • Participating in incident response processes, including triage, mitigation, and communication.
    • Familiarity with incident management tools like PagerDuty or Opsgenie.
    • Responding to production incidents, troubleshooting issues across the full stack, and ensuring minimal downtime by driving root cause analysis and applying long-term fixes.
    • Conducting blameless post-mortems to identify root causes and derive actionable insights, ensuring continuous improvement.
    • Developing playbooks for common incidents, reducing Mean Time to Resolution (MTTR).
  • Resilience and Scalability Design:
    • Understanding of system design principles, scalability, and high-availability architectures.
    • Practical experience with load testing and performance benchmarking tools (e.g., JMeter, Locust, k6).
    • Designing and testing disaster recovery (DR) strategies to ensure minimal downtime and data integrity during failures.

Benefits

  • 2 days flexible/hybrid working arrangement
  • Instant savings and discounts on major retailers across the country
  • Private Health Insurance including Dental and Optical Cover
  • Non-contributory Pension Scheme
  • Salary Sacrifice Schemes – Car, Cycle to Work and Additional Pension Contributions
  • Additional GBST & U day off every year
  • Employee Assistance Program (EAP)
  • LinkedIn Learning

#J-18808-Ljbffr
GBST Holdings Ltd

Contact Details:

GBST Holdings Ltd Recruitment Team