Site Reliability Engineer (Contract)
Location: Remote UK
Contract Type: 6-month initial contract (extension likely)
Rate: £500 A day (outside IR35)
About the Role:
We are looking for a highly skilled Site Reliability Engineer (SRE) with deep expertise in Microsoft Azure to join our cloud operations team. This role is central to ensuring the performance, availability, and scalability of our cloud-hosted services. You will be working on enhancing observability, automating infrastructure, and driving operational excellence across mission-critical Azure environments.
Key Responsibilities:
- Ensure 99.99% uptime for critical applications and services hosted on Azure
- Develop and maintain Infrastructure as Code (IaC) for consistent environment provisioning using Terraform or Bicep
- Build and optimize CI/CD pipelines using Azure DevOps, GitHub Actions, or Jenkins
- Implement and improve monitoring, logging, and alerting using Azure Monitor, Log Analytics, Application Insights, and Grafana
- Define and track SLIs, SLOs, and error budgets in collaboration with product and engineering teams
- Automate incident response and remediation with Azure Automation Runbooks, Logic Apps, and PowerShell
- Manage high-availability, auto-scaling, and failover strategies using Azure Load Balancer, Application Gateway, and Traffic Manager
- Support production workloads on Azure Kubernetes Service (AKS) and implement scalable microservices deployment
- Harden security and compliance with Azure Policy, Azure Defender, and Key Vault
- Participate in on-call rotation and incident response workflows
Key Deliverables:
- Automated Azure Infrastructure: Reusable IaC templates for provisioning environments (dev, test, prod)
- Monitoring and Observability Stack: Fully integrated dashboards, alerts, and logs across services
- Operational Runbooks: Step-by-step remediation guides and automation for common incidents
- CI/CD Workflows: Fully documented and functional pipelines supporting zero-downtime deployments.
- Post-Incident Review Templates: Frameworks for RCA and continuous improvement.
- Availability and Performance Reports: Regular reports on uptime, error rates, latency, etc.
- Security Baseline Configurations: Hardened configurations using Azure-native tools
Required Skills and Experience:
- 5+ years in a DevOps, SRE, or Cloud Engineer role
- Deep hands-on experience with Microsoft Azure, including the following services:
- Azure Kubernetes Service (AKS)
- Azure App Services
- Azure Functions
- Azure Monitor, Log Analytics, and Application Insights
- Azure DevOps
- Azure Load Balancer / Traffic Manager / Application Gateway
- Azure Key Vault, Azure Policy, Azure Defender
- Azure Storage and Cosmos DB (nice to have)
Strong expertise in:
- IaC tools: Terraform, Bicep, ARM Templates
- CI/CD tools: Azure DevOps, GitHub Actions, Jenkins
- Monitoring tools: Prometheus, Grafana, Azure Monitor
- Scripting: PowerShell, Bash, or Python
- Containers and orchestration: Docker, AKS, Helm
- Solid understanding of networking, load balancing, and DNS within Azure
Preferred Qualifications:
- Azure certifications such as:
- Microsoft Certified: Azure Administrator Associate
- Azure DevOps Engineer Expert
- Azure Solutions Architect Expert
- Experience with incident management and on-call practices
- Background in performance tuning and cost optimisation in Azure
- Familiarity with Zero Trust architecture and role-based access control (RBAC)
Locations
Contact Detail:
Innovate Cloud LTD Recruiting Team