Role: Senior Site Reliability Engineer (SRE)
Location: London (full onsite – 5 days every week)
Salary: Up to 80K gross annually
Experience: Minimum 12+ years profile required
Core Competencies:
- Experience with monitoring tools such as Datadog, Splunk, Dynatrace, Grafana, Prometheus, Thousand Eyes, Gremlin, etc.
- Ability to create dashboards for Infrastructure, Application Performance Monitoring (APM), and End-to-End workflows
- Monitoring, logging, alerting, and error budgeting (e.g., 99.9%, 99.99%, 99.999%) for software, operations, and business
- Define Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with business, operations, and engineering teams
- Automation and auto-healing skills using Python, Shell scripting, JavaScript, etc.; developing custom monitoring services
- Experience with logging, monitoring, and event detection on cloud or distributed platforms
- ITIL practices including incident management, change management, problem management, blameless postmortems, documentation, and lessons learned
- Technical operations support focusing on stability, reliability, and resiliency
#J-18808-Ljbffr
Contact Detail:
TN United Kingdom Recruiting Team