Site Reliability Engineer / Production Support in London

Site Reliability Engineer / Production Support in London

London Full-Time No working from home possible
Monument Bank
Description
Site Reliability Engineer/ Production Support
Location London (Oxford Circus) | Hybrid: 2 days per week | Reports to Head of Cloud Operations

ABOUT MONUMENT

We're building something genuinely rare: a financial brand designed for the mass affluent, the professionals, entrepreneurs and ambitious savers that traditional banks have systematically underserved for decades.
We exist to make managing wealth simpler, smarter and more human, treating every client's wealth with the same care as if it were our own.
We hold over £7 billion in client savings, serve more than 100,000 clients, and were named the UK's fastest growing fintech in 2025. The momentum is real.
THE OPPORTUNITY

Monument’s production environment is the heartbeat of a licensed bank, and the SRE role is the single point of ownership when incidents occur. You will directly oversee the offshore Production Support team, run on-call and incident response, and ensure fast detection, triage and restoration of services.
This is not a passive monitoring role. You are expected to understand at a working level all of Monument’s key system flows from the services, partners and teams in play and to actively debug incidents, escalate effectively and drive permanent fixes. You will also be a builder: using AI tools for automated alert correlation, root cause analysis and runbook generation.
For the right person, this is a rare opportunity to own production reliability at a pre-IPO challenger bank, operating at the intersection of deep engineering and real commercial consequence.

WHAT YOU'LL DO

  • Directly oversee the offshore Production Support team and be the single point person when incidents occur, escalating only to Head Of when required.
  • Run on-call and incident response; ensure fast detection, triage, and restoration.
  • Maintain observability standards (logs, metrics, traces) and alert quality (low noise, high signal).
  • Understand at a working level all key system flows, the services, partners, and teams in play, and how to actively debug an incident.
  • Lead reliability engineering: resilience patterns, performance tuning, capacity planning.
  • Facilitate post-incident reviews and track actions to completion.
  • Use AI tools for automated alert correlation, root cause analysis, and runbook generation.
  • Hunt for routine/common tasks and formulate plans on how to automate and then execute them.

THE MINDSET
You’ll thrive here if you live by the same principles that define all Monument builders:
  • Ownership under pressure - when things go wrong, you are calm, decisive, and effective. You own the incident until it's resolved.
  • Builder - you don't just respond to incidents; you build the automation that prevents them or resolves them faster next time.
  • Deeply curious - you understand the full system landscape and how services interact. You can debug across layers.
  • Automation-first - every manual task is a candidate for automation. You actively hunt for toil and eliminate it.
  • Quality-driven - you care about alert quality, observability standards, and reliability patterns that prevent problems at source.

WHAT YOU BRING
  • Strong SRE or production support experience with accountability for incident response in a production environment.
  • Deep understanding of observability tools, alerting, logging, and distributed systems debugging.
  • Experience managing and working with offshore support teams.
  • Hands-on experience with reliability engineering: resilience patterns, performance tuning, capacity planning.
  • Active use of AI tools for incident triage, automation, and runbook generation.
  • Ability to understand complex system flows across multiple services and third-party integrations.
  • Experience in financial services or similarly regulated environments is a strong advantage.

WHAT'S IN IT FOR YOU
Be the person who keeps Monument running - your work directly protects clients and the business. Build automation that genuinely matters: every runbook you automate and every alert you tune makes the system more resilient. Work with modern AI tools as a core part of your daily workflow, not as a novelty. Own production reliability at a critical stage of Monument’s growth, with real responsibility and real impact.
OUR VALUES
At Monument, our values shape how we make decisions, how we treat each other when things get hard, and how we show up for clients who expect more than standard banking.
We set ambitious goals and hold ourselves to them, not because it looks good, but because our clients' outcomes depend on it. When something isn't working, we say so early, learn from it, and move. We don't wait for perfect conditions, and we don't protect egos over progress.
We work as a genuine team, which means real collaboration, honest conversations when we disagree, and shared accountability when things go wrong. We know better decisions come from different perspectives, so we actively value the range of experiences and backgrounds our people bring.
We're always asking whether there's a smarter way to do what we do, not for the sake of change, but because standing still isn't an option in the market we're in.
If that sounds like how you like to work, we'd like to hear from you.

Monument Bank

Contact Details:

Monument Bank Recruitment Team