At a Glance
- Tasks: Lead a high-impact SRE team, ensuring system availability and operational excellence.
- Company: Join a global leader in trading platforms, committed to innovation and collaboration.
- Benefits: Enjoy flexible work options, competitive salary, and opportunities for professional growth.
- Why this job: Be part of a dynamic team that values mentorship, innovation, and continuous improvement.
- Qualifications: Bachelor’s degree in Computer Science or related field; 5+ years in SRE/DevOps; leadership experience preferred.
- Other info: Experience with Python, SQL, AWS, and automation tools is a plus.
The predicted salary is between 43200 - 72000 £ per year.
This role plays a key part in the global follow-the-sun support model, working closely with the Global SRE Leader to support platforms worldwide. We are looking for SRE talent with experience in an On-Prem / Datacenter environment. The ideal candidate will bring strong technical leadership, experience in an On-Prem / Datacenter environment, and a passion for operational excellence to a high-impact team. You'll collaborate with Engineering, Infrastructure, and Operations teams to maintain high availability and resilient service delivery, while also mentoring a SRE team focused on continuous improvement and innovation.
Key Responsibilities:
- Technical Leadership: Develop deep expertise in the Titanium trading platform to lead and support critical business operations. Oversee team workload, ensuring priorities align with business goals and resource capacity.
- Operational Excellence: Champion initiatives that enhance system availability, scalability, and performance. Collaborate with the Global SRE Leader to refine and enforce operational policies (e.g., Capacity Planning, Change Management, Disaster Recovery).
- Cross-Functional Collaboration: Partner with Software Engineering, Infrastructure, Operations, Security, and Business teams to deliver secure and reliable platforms.
- Team Development: Build, lead, and mentor a high-performing SRE team in Europe, fostering a culture of ownership, collaboration, and innovation.
- Incident Response & Postmortems: Lead response efforts for critical incidents, ensuring swift resolution and comprehensive root cause analysis. Drive long-term improvements based on lessons learned from Learning Reviews, and maintain accurate incident documentation and compliance reporting.
- Automation & Efficiency: Lead automation initiatives to streamline workflows and increase uptime. Use Jira to manage tasks and projects, and align global SRE practices for seamless support.
- Capacity Planning: Drive timely capacity planning to prevent last-minute issues. Support budget planning to align infrastructure investments with growth and performance targets. Participate in quarterly capacity reviews and follow up on outcomes.
- Monitoring & Analytics: Oversee the implementation of monitoring and alerting systems to detect and resolve issues proactively—before customer or compliance impacts occur.
Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred)
- 5+ years in a technical SRE, DevOps Position
- 2+ years in a leadership or senior engineering capacity
Preferred Skills:
- Strong Python programming skills
- Proficiency in SQL and data analytics tools (e.g., Sigma, Snowflake)
- Experience in AWS, monitoring tools (Datadog, Prometheus, Grafana), and automation frameworks (Terraform, Ansible, Pulumi)
For more information, please apply with a relevant CV.
Lead Site Reliability Engineer employer: Signify Technology
Contact Detail:
Signify Technology Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Lead Site Reliability Engineer
✨Tip Number 1
Familiarise yourself with the Titanium trading platform and its operational requirements. Understanding the specific technologies and challenges associated with this platform will help you demonstrate your technical leadership during interviews.
✨Tip Number 2
Highlight your experience in On-Prem / Datacenter environments. Be prepared to discuss specific projects where you've successfully maintained high availability and resilience, as this is crucial for the role.
✨Tip Number 3
Showcase your ability to lead and mentor teams. Prepare examples of how you've fostered a culture of collaboration and innovation within your previous teams, as this aligns with our focus on team development.
✨Tip Number 4
Be ready to discuss your experience with automation tools and practices. Highlight any initiatives you've led that improved efficiency and uptime, as this is a key responsibility of the role.
We think you need these skills to ace Lead Site Reliability Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights your experience in On-Prem / Datacenter environments and showcases your technical leadership skills. Use specific examples that demonstrate your operational excellence and collaboration with cross-functional teams.
Craft a Compelling Cover Letter: Write a cover letter that reflects your passion for the role and the company. Mention your experience with automation initiatives, incident response, and mentoring teams. Be sure to connect your skills to the key responsibilities outlined in the job description.
Highlight Relevant Skills: In your application, emphasise your proficiency in Python, SQL, and any relevant monitoring tools or automation frameworks. This will help you stand out as a candidate who meets the preferred skills for the position.
Showcase Continuous Improvement: Include examples of how you've driven continuous improvement in previous roles. Discuss any initiatives you've led that enhanced system availability or performance, as this aligns with the operational excellence aspect of the job.
How to prepare for a job interview at Signify Technology
✨Showcase Your Technical Expertise
Be prepared to discuss your experience in On-Prem and Datacenter environments. Highlight specific projects where you demonstrated technical leadership, especially in relation to the Titanium trading platform or similar systems.
✨Demonstrate Operational Excellence
Prepare examples of how you've championed initiatives that improved system availability and performance. Discuss your approach to capacity planning, change management, and disaster recovery to show your commitment to operational excellence.
✨Emphasise Cross-Functional Collaboration
Illustrate your experience working with diverse teams such as Software Engineering, Infrastructure, and Operations. Share stories that highlight your ability to foster collaboration and deliver secure, reliable platforms.
✨Prepare for Incident Response Scenarios
Expect questions about your experience leading incident response efforts. Be ready to discuss how you've handled critical incidents, conducted root cause analyses, and implemented improvements based on lessons learned.