At a Glance
- Tasks: Lead a global team to ensure high availability of trading platforms and tackle complex technical challenges.
- Company: Join Cboe, a leading financial exchange with a dynamic and innovative culture.
- Benefits: Enjoy competitive salary, flexible working options, and opportunities for professional growth.
- Other info: Be part of a collaborative environment with excellent career advancement opportunities.
- Why this job: Make a real impact in the fast-paced world of trading technology and operations.
- Qualifications: Experience in technical leadership, software engineering, and strong communication skills required.
The predicted salary is between 60000 - 80000 £ per year.
The Senior Manager, Site Reliability Engineering (London) is an experienced leader responsible for overseeing a globally distributed team of SRE technologists with diverse skills ranging from software development to systems, network, application, and/or database management — with deep subject matter expertise in one or more of these disciplines. This role sits at the heart of Cboe’s follow-the-sun support model for its US Global Trading Hours (GTH) markets. Based in London, the Senior SRE Manager provides direct platform support for Cboe’s European operations while also holding oversight responsibility for SRE staff across both the European and Asia-Pacific time zones, ensuring seamless, continuous coverage of Cboe’s real-time low-latency trading platforms around the clock.
The Senior SRE Manager will play a key role supporting and providing guidance throughout the full project lifecycle to deliver operational requirements on schedule, drive strategy across multiple areas of the organization, and tackle complex problems that may lack clear or full strategic definition.
- Technical Leadership & System Availability: Provide technical leadership, support, and operational oversight to sustain resiliency and high availability of critical business operations across European and GTH market sessions. Monitor Cboe production, disaster recovery, and certification systems for issues. Troubleshoot and drive resolution of issues. Analyze and optimize performance of real-time trading platforms. Oversee daily system checks and ensure Cboe platforms and systems are operating as expected. Take direct action to resolve known issues as needed. Assist the build team to resolve build/deployment issues.
- People Leadership & Team Development: Lead, mentor, and provide guidance to direct reports across the European and APAC time zones responsible for platform support. Delegate assignments to direct reports. Create and execute agile based processes such as Kanban and Scrum to actively manage the workload of the team, ensuring task completion in support of business projects and internal customer timelines. Actively and intentionally connect direct reports to others within their team, department, and across the organization. Support training and development needs to create a best-in-class SRE team. Establish operational objectives, policies, and procedures. Interact regularly with management on matters concerning multiple functional areas, departments, and/or customers. Liaise with business associates, infrastructure engineers, software engineers, and Cboe management.
- Platform Configuration Management & Project Oversight: Develop and manage operational initiatives to deliver tactical results. Translate functional plans into operational processes and guide execution, providing project management support for all updates applicable to platforms of responsibility. Provide for configuration management of new and existing trading platforms and support implementation of new features and functionality based on new business requirements. While the primary focus of this role involves support of bare-metal on-premises infrastructure, experience with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes) is desirable. Monitor development activities, change management tickets, and evaluate their impact on Cboe Operations. Approve and execute daily change tickets assigned to Site Reliability Engineering. Organize testing of changes prior to deployment and work with software engineering to resolve systemic issues. Demonstrate knowledge of Compliance obligations impacting regulated platforms and work closely with Compliance staff to ensure incident triage, reporting, and remediation obligations are met.
- Incident Response & Escalation Management: Serve as the senior escalation point for production incidents across European and GTH market hours. Coordinate incident triage, root cause analysis, and resolution across globally distributed engineering and operations teams. Provide timely, precise communication to stakeholders during active incidents and drive post-incident reviews and remediation tracking to deliver long-term platform stability.
- Subject Matter Expertise & Stakeholder Engagement: Develop and provide subject matter expertise on all aspects of the platforms of responsibility. Advise teams on moderately complex matters. Liaise with vendors and operators of facilities supporting critical financial infrastructure.
- Reporting & Data Analysis: Create and improve upon existing reports related to Operations management. Analyze technical data sets (e.g., order entry, market data, matching engine logs) to troubleshoot or explain perceived issues. Utilize SQL to conduct data mining against databases, UNIX shell to examine log files, or other tools as necessary to gather and analyze data in service of the Operations team.
- Capacity Planning: Drive capacity planning decisions for Cboe Exchanges, especially within the European region, and support capacity planning needs of various Cboe business units. Provide an active voice within Capacity Planning meetings with engineering and technical operations management staff.
- Automation & Process Improvement: Provide thought leadership to identify and lead task automation opportunities, including automation of system health monitors, alerts, and remediations. Support automation efforts through development, testing, and maintenance of Python tools. Leverage AI to maximize efficiency.
- On-Call & Weekend Testing: Lead and participate in weekend testing (e.g., capacity testing, fail-over, etc.) and provide follow-the-sun on-call technical support as part of Cboe’s global Operations team.
This individual must be capable of applying their knowledge in a manner that builds stakeholder confidence and achieves desirable outcomes while preserving relationships. The ideal candidate has a Bachelor’s Degree in Computer Science, Computer Engineering, Software Engineering, Business, Financial Services, Communications, or a related discipline (Master’s preferred). Technical Operations Leadership or Senior Role. UNIX Shell, SQL, Software Engineering or Systems, Network, or Database Administration. Python, C++ or other programming language. Networking (both TCP/IP and Multicast) and NIC Configuration. Desirable area of expertise and/or skills. Fluency in English—both written and spoken—is required. The role demands clear, precise, and unambiguous communication at all times. Familiarity with European markets as well as US equities, options, and/or futures market structures is strongly preferred. This role directly supports both Cboe’s European exchange operations and its US GTH market coverage, requiring situational awareness across multiple market sessions and regulatory environments.
Senior Site Reliability Engineering Manager in London employer: Cboe
Cboe is an exceptional employer that fosters a dynamic and inclusive work culture, where innovation and collaboration thrive. Based in London, the Senior Site Reliability Engineering Manager role offers unique opportunities for professional growth, with access to cutting-edge technology and a globally distributed team. Employees benefit from a supportive environment that prioritises continuous learning and development, ensuring they are well-equipped to tackle complex challenges in the fast-paced world of trading.