Responsibilities * We will support teams using ConnellsX and respond to incidents in a structured, blameless way. * We will investigate root causes and drive post-incident actions to completion. * We will define SLIs, contribute to SLOs, and monitor error budgets. * We will build dashboards, alerts, and runbooks to improve visibility. * We will automate repetitive tasks to reduce operational toil. * We will collaborate with cross-functional teams to enhance reliability and observability. * We will support performance testing and capacity planning. * We will proactively identify and prioritise reliability improvements. Technologies * Azure * C# * Cloud * Docker * GitHub * Support * Kubernetes * NextJS * OpenTelemetry * PowerShell * React * Security * Terraform * ASP.NET * DevOps Additional Requirements * We need hands-on experience with Azure Monitoring, including Application Insights, Alerts, and Action Groups. * We need strong knowledge of OpenTelemetry, including Kubernetes. * We need scripting and automation experience with PowerShell and/or Azure CLI. * We need experience with Terraform and GitHub Actions. * We need the ability to define SLIs and SLOs and manage error budgets. * We need incident response and post-incident review experience. * We need familiarity with Docker and Kubernetes. * We need strong communication and documentation skills. * Desirable: working knowledge of .NET/C# and React/NextJS. * Desirable: experience with cloud cost optimisation. * Desirable: knowledge of Azure networking, including DNS, VNets, and Firewalls. * Desirable: understanding of security frameworks such as ISO 27002 and NIST CSF. * Desirable: Azure certifications. * We need applicants to have the right to work in the UK, as we are unable to provide visa sponsorship. #J-18808-Ljbffr