At a Glance
- Tasks: Lead a high-performing SRE team to ensure system stability and performance.
- Company: Join the London Stock Exchange Group, a leader in financial markets infrastructure.
- Benefits: Competitive salary, career growth, and a culture of integrity and excellence.
- Other info: Opportunity to work in a collaborative culture focused on continuous improvement.
- Why this job: Make a real impact in a dynamic environment while driving innovation in tech.
- Qualifications: Proven leadership in SRE roles with deep technical expertise in Oracle databases.
The predicted salary is between 44567 - 44567 Β£ per year.
We are looking for a Manager - Site Reliability Engineering to strengthen the Production Management leadership team of Clearing Technology Service. This role demands a proactive and hands-on leader with deep technical expertise and strong critical thinking.
Role summary
You will be responsible for ensuring stability, resilience, and performance of our production systems while driving continuous improvement and SRE best practices across the platform.
What you'll be doing
- Service Ownership: Assume end-to-end accountability for Clearing production environment, ensuring high availability, optimal performance, and robust resilience of business-critical systems.
- Incident Management & Crisis Leadership: Act as Incident Commander during major incidents, leading resolution efforts, managing stakeholder communications, and driving root cause analysis and remediation.
- Team Leadership & Talent Development: Build and mentor a high-performing SRE team. Promote a culture of accountability, continuous improvement, and blameless postmortems to enhance operational excellence.
- Operational Excellence & SLA Compliance: Ensure consistency to response and resolution SLAs. Oversee efficient ticket management and escalation processes through ServiceNow, removing blockers promptly.
- Stakeholder Engagement & Relationship Management: Develop strong partnerships across LCH and LSEG teams. Ensure timely delivery of business-critical activities and transparent communication of risks and challenges.
- Process Optimisation & Continuous Improvement: Monitor and analyse technical processes to identify improvement opportunities. Implement enhancements to minimise business disruption and improve operational efficiency.
- Risk Management & Compliance: Ensure compliance with regulatory standards and internal governance. Proactively identify and mitigate operational risks.
- Metrics & Observability: Establish and maintain robust observability practices, employing metrics, logging, and tracing to drive data-driven decisions and improve system health.
- Out of hours support / On-call support: Be available for overnight support of production services to ensure successful completion of processing. Respond to overnight calls and deal with issues.
- Disaster Recovery: Participate in Disaster Recovery exercises.
What you'll bring
- Degree educated or equivalent work experience.
- Number of years in Production Support / SRE roles with at least 3 years in a leadership capacity.
- Deep technical expertise in Oracle database - troubleshooting, scalability, performance tuning and optimization.
- Demonstrated experience implementing SRE frameworks - including SLOs, SLIs, incident management, and chaos engineering.
- Experience leading teams supporting systems deployed across mixed infrastructure (Cloud and On-Premise, AWS preferred).
- Solid understanding of change management, risk posture, and production readiness.
- Strong track record of delivering automation at scale, reducing toil, and eliminating manual operational tasks.
- Excellent communication and stakeholder management skills, particularly under pressure.
- Expertise in automation (Python, Shell, PowerShell etc.).
- Familiarity with observability tools and practices (metrics, logging, tracing).
- Ability to lead capacity planning and scalability strategies to support growth.
- Knowledge of clearing and settlement processes in financial markets.
- Familiarity with regulatory requirements and governance frameworks in financial services.
- Demonstrated ability to build, mentor, and retain high-performing SRE teams.
- Good communication and stakeholder management skills under pressure.
Person Specification
- Demonstrable experience managing SRE or Production Support teams in a critically important financial services environment.
- Experience managing teams located across multiple locations and time zones.
- Excellent analytical skills, attention to detail and problem-solving abilities.
- Solid technical background in the core technologies with several years of experience.
- Ability to communicate clearly and concisely to IT and business teams and to senior management.
- Ability to break down complex technical issues into easy to digest format.
- Familiarity with financial products and terminology.
About LSEG
London Stock Exchange Group (LSEG) is a leading global financial markets infrastructure and data provider. Our purpose is driving financial stability, empowering economies and enabling customers to create sustainable growth. We value integrity, partnership, excellence and change. We are an equal opportunities employer and promote sustainability and community involvement.
Manager - Site Reliability Engineering | London, UK employer: London Stock Exchange Group
At London Stock Exchange Group, we pride ourselves on being an exceptional employer, offering a dynamic work culture that fosters innovation and collaboration. As a Manager in Site Reliability Engineering, you will not only lead a talented team but also have access to extensive professional development opportunities, ensuring your growth in the fast-paced financial services sector. Our London location provides a vibrant environment, with a commitment to sustainability and community involvement, making it a rewarding place to advance your career while contributing to global financial stability.
Contact Details:
London Stock Exchange Group Recruitment Team