Lead Site Reliability Engineer

Lead Site Reliability Engineer

Full-Time 80000 - 100000 £ / year (est.) No working from home possible
London Stock Exchange

At a Glance

  • Tasks: Lead reliability engineering efforts and collaborate with teams to enhance system performance.
  • Company: Join a forward-thinking company focused on innovation and operational excellence.
  • Benefits: Attractive salary, flexible working options, and opportunities for professional growth.
  • Other info: Dynamic role with opportunities to mentor and influence engineering standards.
  • Why this job: Shape the future of reliability in tech and make a significant impact.
  • Qualifications: 10+ years in SRE or related fields with strong technical expertise.

The predicted salary is between 80000 - 100000 £ per year.

This position requires a highly proactive, hard-working expert with strong leadership presence and ownership of platform reliability outcomes. We are looking for a person who is passionate about reliability engineering and who brings a continuous improvement approach to everything they do!

Requirements:

  • Bachelor’s Degree in Computer Science or related field
  • 10+ years of hands-on technical experience in SRE, Platform Engineering, Infrastructure, or related roles
  • Strong experience with AWS, including services such as EKS, ECS, EC2, networking, IAM, and managed services
  • Deep hands-on experience with Kubernetes and containerised platforms
  • Strong background in Linux systems administration
  • Proven experience designing and operating observability platforms, including monitoring, logging, and alerting
  • Hands-on experience with Datadog for metrics, logs, APM, and alerting
  • Strong understanding of SRE principles, including SLOs, error budgets, incident management, and reliability engineering
  • Experience working closely with architecture and engineering teams on system design and delivery
  • Solid understanding of cloud security principles and experience collaborating with security teams
  • Experience with cloud cost optimisation strategies and tooling
  • Hands-on experience integrating AI with observability stacks (Prometheus, Grafana, ELK, OpenTelemetry) for proactive issue detection
  • (Desirable) Experience or working knowledge of Microsoft Azure
  • (Desirable) Experience supporting multi-cloud or hybrid environments
  • (Desirable) Exposure to Infrastructure as Code (e.g., Terraform, CloudFormation)
  • (Desirable) Experience in large-scale, complex, or regulated environments
  • (Desirable) Knowledge of vector databases and RAG architectures for building internal SRE knowledge assistants
  • (Desirable) Knowledge of Generative AI and LLM platforms (e.g., Claude, Amazon Bedrock)
  • Strong technical authority with the ability to influence design and operational decisions
  • Highly collaborative, comfortable working across architecture, engineering, security, and operations teams
  • Calm and methodical under pressure, especially during incidents and critical issues
  • Pragmatic problem-solver who balances reliability, security, cost, and delivery speed
  • Clear communicator, able to explain complex technical concepts to diverse audiences

What the job involves:

  • We are evolving our Site Reliability Engineering capabilities to strengthen reliability, observability, security, and operational excellence across our Markets and Risk Intelligence division.
  • As a Technical Lead SRE, you will be a senior hands-on technical person helping shape the foundations of reliability across both new and existing platforms.
  • You will collaborate with Architecture, Engineering, Security, and Platform teams to ensure reliability is built into systems from day one.
  • While this is not a people-management or shift-based role, you will work closely with global teams and may occasionally be called upon for major incidents or critical issues.
  • Lead the establishment of SRE foundations for new projects building environments, monitoring, alerting, and ensuring operational readiness from day one.
  • Collaborate with Architecture and Engineering teams to embed reliability, scalability, security, and observability into system design.
  • Define, implement, and champion observability standards, tooling, and guidelines across metrics, logs, traces, and SLIs/SLOs.
  • Design and evolve monitoring and alerting solutions that improve visibility, reduce toil, and strengthen system health.
  • Continuously drive reliability improvements across our environments through incident reduction, performance tuning, and building resilient patterns.
  • Partner with Security teams to ensure our platforms meet compliance, security, and risk-management expectations.
  • Lead seamless handovers from project delivery into BAU SRE operations by ensuring documentation, readiness, and strong operational practices.
  • Influence architectural and design decisions through data-driven cloud cost optimization and efficiency initiatives.
  • Be a technical leader and mentor supporting engineers, shaping engineering standards, and fostering a culture of learning and development.

Lead Site Reliability Engineer employer: London Stock Exchange

As a Lead Site Reliability Engineer, you will join a forward-thinking company that prioritises innovation and excellence in technology. Our collaborative work culture fosters continuous improvement and professional growth, providing you with the opportunity to influence key architectural decisions while working alongside talented teams across various disciplines. Located in a vibrant tech hub, we offer competitive benefits and a commitment to employee development, making us an exceptional employer for those passionate about reliability engineering.

London Stock Exchange

Contact Details:

London Stock Exchange Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Lead Site Reliability Engineer

Tip Number 1

Network like a pro! Get out there and connect with folks in the industry. Attend meetups, webinars, or conferences related to Site Reliability Engineering. You never know who might have the inside scoop on job openings or can refer you directly!

Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those involving AWS, Kubernetes, or observability tools. This gives potential employers a taste of what you can bring to the table.

Tip Number 3

Prepare for interviews by brushing up on SRE principles and incident management. Be ready to discuss how you've tackled reliability challenges in the past. Practice explaining complex concepts in simple terms – it shows you can communicate effectively with diverse teams.

Tip Number 4

Don’t forget to apply through our website! We’re always on the lookout for passionate individuals who want to make a difference in reliability engineering. Your next big opportunity could be just a click away!

We think you need these skills to ace Lead Site Reliability Engineer

Site Reliability Engineering (SRE)
Platform Engineering
AWS (EKS, ECS, EC2, IAM)
Kubernetes
Linux Systems Administration
Observability Platforms (Monitoring, Logging, Alerting)
Datadog

Some tips for your application 🫡

Show Your Passion for Reliability:When you're writing your application, let your enthusiasm for reliability engineering shine through! We want to see how you've embraced a continuous improvement mindset in your past roles. Share specific examples that highlight your proactive approach and dedication to platform reliability.

Tailor Your Experience:Make sure to align your experience with the requirements listed in the job description. Highlight your hands-on technical skills in SRE, AWS, and Kubernetes. We love seeing how your background fits into our needs, so don’t hold back on showcasing your relevant achievements!

Communicate Clearly:Remember, we’re looking for clear communicators who can explain complex concepts. Use straightforward language in your application and avoid jargon where possible. This will help us understand your thought process and how you can bridge the gap between technical and non-technical teams.

Apply Through Our Website:We encourage you to apply directly through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it gives you a chance to explore more about StudySmarter and what we stand for!

How to prepare for a job interview at London Stock Exchange

Know Your Stuff

Make sure you brush up on your technical knowledge, especially around AWS services like EKS, ECS, and EC2. Be ready to discuss your hands-on experience with Kubernetes and Linux systems, as well as any observability platforms you've designed or operated.

Show Your Passion for Reliability

Demonstrate your enthusiasm for reliability engineering during the interview. Share examples of how you've implemented continuous improvement in your previous roles and how you approach problem-solving under pressure.

Collaborate Like a Pro

Since this role involves working closely with various teams, be prepared to talk about your collaborative experiences. Highlight instances where you've partnered with architecture, engineering, or security teams to enhance system reliability and observability.

Communicate Clearly

Practice explaining complex technical concepts in simple terms. You might be asked to describe your approach to incident management or cloud cost optimisation, so make sure you can convey your ideas clearly to diverse audiences.