At a Glance
- Tasks: Enhance system reliability and performance in a dynamic cloud environment.
- Company: Join Experian, a leader in innovation and diversity.
- Benefits: Enjoy hybrid working, competitive pay, and generous leave policies.
- Other info: Embrace a culture of inclusivity and continuous improvement.
- Why this job: Make a real impact on critical systems while leading a talented team.
- Qualifications: Expertise in AWS and strong leadership skills required.
The predicted salary is between 60000 - 80000 £ per year.
We are looking for a Site Reliability Engineer to improve the reliability and performance of business-critical systems. Reporting into our Head of SRE, you will focus on AWS cloud infrastructure, DevOps tooling, and core SRE practices within a distributed, production environment.
Main Responsibilities:
- Define and implement SRE best practices across the organization.
- Proven expertise in production support, engineering, disaster recovery (DCR), automation, and cloud operations.
- Mentor and guide a team of SREs, fostering growth.
- Collaborate with senior stakeholders to align reliability goals with business objectives.
- Reliability & Performance
- Establish SLIs, SLOs, and SLAs for critical services and ensure adherence.
- Drive initiatives to improve system resilience and reduce operational toil.
- Excellent in designing systems that detect and remediate issues without manual intervention – Self Healing systems, Runbook automation.
- Exposure to tools like Gremlin, Chaos Monkey, AWS FIS to simulate outages and improve fault tolerance.
- Act as the primary point of escalation for critical production issues and lead major incident response, root cause analysis, and postmortems.
- Perform detailed post-incident investigations to identify underlying causes.
- Document findings and share learnings to prevent recurrence.
- Implement preventive measures and continuous improvement processes.
- Observability
- Champion monitoring, logging, and alerting strategies using tools like Prometheus, Grafana, ELK, and AWS CloudWatch.
- Build real-time dashboards to visualize system health and reliability metrics.
- Configure intelligent alerting based on anomaly detection and thresholds.
- Combine metrics, logs, and traces to enable root cause analysis and reduce Mean Time to Resolution (MTTR).
- Knowledge of AIOps or ML-based anomaly detection for proactive reliability management.
- Work closely with development teams to integrate reliability into application design and deployment.
- Promote a culture of shared responsibility for uptime and performance across engineering teams.
Qualifications:
- Deep expertise with various AWS services.
- Advanced knowledge of monitoring and observability tools.
- Strong leadership capabilities with a focus on setting clear direction, aligning team efforts with organizational goals, and maintaining high levels of motivation and engagement across the team.
- Excellent communication skills, with the ability to articulate complex ideas, solutions, and feedback clearly to both technical and non-technical stakeholders.
- Adept at managing conflict constructively and facilitating consensus.
- Proven track record of building secure, mission-critical, high-volume transaction web-based software systems, preferably in regulated environments (finance and insurance industries).
- Hands-on technologist working in software development including leading an SRE team.
Additional Information:
- Hybrid working, 2 days a week in our Nottingham Office.
- Great compensation package and discretionary bonus.
- Core benefits include pension, Bupa healthcare, sharesave scheme, and more.
- 25 days annual leave with 8 bank holidays and 3 volunteering days. You can purchase additional annual leave.
Experian is proud to be an Equal Opportunity employer. Innovation is an important part of Experian’s DNA and practices, and our diverse workforce drives our success. Everyone can succeed at Experian and bring their whole self to work, irrespective of their gender, ethnicity, religion, colour, sexuality, physical ability, or age. If you have a disability or special need that requires accommodation, please let us know at the earliest opportunity.
Senior Site Reliability Engineer in Nottingham employer: Experian Health
Experian is an exceptional employer that fosters a culture of innovation and inclusivity, making it an ideal place for a Senior Site Reliability Engineer to thrive. With a hybrid working model based in Nottingham, employees benefit from a competitive compensation package, generous annual leave, and opportunities for professional growth through mentorship and collaboration with senior stakeholders. The company's commitment to diversity and employee well-being ensures that everyone can bring their whole self to work, contributing to a dynamic and supportive environment.