Cloud Operations Engineer (AWS or Azure) - Cheltenham
One of the UK’s largest financial services technology provider are seeking an experienced, hands‑on Cloud Operations Engineer to join the team responsible for the availability, security, resilience, and performance of their AWS-hosted infrastructure.
This is a strictly hands‑on, operational role focused on keeping their production environments stable, secure, and performant. You will operate as a core technical practitioner, balancing day‑to‑day BAU operations and infrastructure changes with continuous improvements to their resilience, automation, monitoring, and security posture.
About you:
- AWS Expertise: Proven hands‑on experience managing AWS‑hosted production environments, particularly compute (EC2, ECS), Application Load Balancers (ALB), and networking (VPCs, Security Groups, routing).
- Core Infrastructure Skills: Strong technical competency in Windows Server administration and foundational networking/load‑balancing principles.
- SQL Server Experience: Operational support knowledge of Microsoft SQL Server (Note: This is not a DBA role, but requires comfort with basic triage and monitoring).
- Automation & CI/CD: Hands‑on experience executing, supporting, and troubleshooting CI/CD pipelines and infrastructure automation issues.
- Observability Tooling: Proficiency with monitoring systems like Datadog, Amazon CloudWatch, AWS X‑Ray, or similar tools.
What will you be doing?
- Infrastructure Management: Operate and support their AWS production and non‑production environments, focusing heavily on compute services (EC2, ECS), VPC networking, and storage.
- Incident Response & Triage: Act as the 1st and 2nd line of defense for platform/application issues. Participate in an out‑of‑hours on‑call rotation for P1/P2 incidents and scheduled deployments.
- Change & Release Support: Execute approved infrastructure changes following strict security controls. Support delivery teams with automated deployments and troubleshoot failed or degraded CI/CD releases.
- Database Support: Provide operational support for Microsoft SQL Server (health checks, job monitoring, basic triage, backup verification) and collaborate with DBA teams for deeper optimizations.
- Monitoring & Observability: Own and optimize monitoring tools (Datadog, CloudWatch, AWS X‑Ray) to catch genuine service‑impacting issues while reducing alert fatigue.
- Security & Compliance: Assess and remediate vulnerabilities, assist with compliance audits (ISO, regulatory assurance), and contribute to incident post‑mortems.