Site Reliability Engineer in Hampshire

Site Reliability Engineer in Hampshire

Hampshire Full-Time 60000 - 80000 Β£ / year (est.) No working from home possible
S

At a Glance

  • Tasks: Solve complex operational challenges and improve service reliability through automation.
  • Company: Join a forward-thinking tech company with a focus on collaboration and innovation.
  • Benefits: Enjoy remote work, competitive salary, and opportunities for professional growth.
  • Other info: Dynamic role with excellent career advancement opportunities in a supportive environment.
  • Why this job: Make a real impact on cloud platforms while working with cutting-edge technologies.
  • Qualifications: Strong Linux skills, AWS experience, and scripting knowledge required.

The predicted salary is between 60000 - 80000 Β£ per year.

What We're Looking For

We're looking for someone who enjoys solving complex operational challenges through engineering rather than manual intervention. You'll be proactive, collaborative, and passionate about improving reliability through automation and continuous improvement. If you're excited about building resilient cloud platforms and making a measurable impact on service reliability, we'd love to hear from you.

Key Responsibilities

  • Incident Management & Operations
    • Participate in a 24/7 on-call rota as a primary or escalation point.
    • Lead or support major incident response, including triage, mitigation, and resolution.
    • Coordinate with Engineering, Infrastructure, Security, and Product teams during incidents.
    • Develop, maintain, and continuously improve operational runbooks and playbooks.
    • Conduct blameless post-incident reviews and drive follow-up improvements.
  • Monitoring & Alerting
    • Monitor the health of infrastructure, applications, and services.
    • Design and optimise alerting strategies aligned with service reliability objectives (SLIs/SLOs).
    • Reduce alert fatigue through continuous tuning and optimisation.
    • Build and maintain dashboards using technologies such as: Grafana, Prometheus, Datadog, Splunk, AWS CloudWatch.
  • Reliability Engineering & Automation
    • Automate repetitive operational tasks to minimise manual effort.
    • Improve Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).
    • Develop automation tools and scripts using Python, Bash, Go, or similar languages.
    • Implement self-healing and auto-remediation where appropriate.
    • Work closely with engineering teams to improve application and platform reliability.
  • Platform & Infrastructure
    • Support and troubleshoot Linux-based production environments.
    • Manage cloud infrastructure, primarily within AWS.
    • Support containerised environments using Docker and Kubernetes.
    • Assist with capacity planning, availability reviews, and production readiness for new releases.

Skills & Experience

Essential

  • Strong Linux systems administration experience.
  • Experience supporting production environments and managing incidents.
  • Hands-on experience with AWS cloud infrastructure.
  • Experience with Docker and Kubernetes.
  • Scripting or programming experience with Python, Bash, Go, or similar.
  • Solid understanding of networking fundamentals, including DNS, TCP/IP, and load balancing.
  • Experience working in a 24/7 operations or NOC environment.
  • Ability to remain calm and effective during high-pressure production incidents.
  • Excellent communication and stakeholder coordination skills.

Desirable

  • Experience working with Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
  • Previous experience helping organisations transition from traditional NOC operations to an SRE model.
  • Infrastructure as Code experience using Terraform, Ansible, or similar tools.
  • Exposure to security, compliance, or regulated environments.

Site Reliability Engineer in Hampshire employer: Spectrum IT Recruitment Limited

At Spectrum IT Recruitment, we pride ourselves on fostering a dynamic and inclusive work culture that empowers our Site Reliability Engineers to thrive. With fully remote opportunities, we offer flexible working arrangements, competitive benefits, and a strong focus on professional development, ensuring that our employees can grow their skills while making a significant impact on service reliability in a collaborative environment.

S

Contact Details:

Spectrum IT Recruitment Limited Recruitment Team

We think you need these skills to ace Site Reliability Engineer in Hampshire

Incident Management
Operational Runbooks Development
Monitoring and Alerting
Grafana
Prometheus
Datadog
Splunk