Site Reliability Engineer in London

Job Board

Companies

Neev Limited

Site Reliability Engineer

Site Reliability Engineer in London

London Full-Time 70000 - 90000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Automate and optimise processes for the Market Risk Platform using Python and AI.
Company: Join a leading financial institution focused on innovation and reliability.
Benefits: Competitive pay, hands-on experience, and opportunities for professional growth.
Other info: Dynamic work environment with a focus on collaboration and continuous improvement.
Why this job: Make a real impact by reducing operational toil and enhancing automation.
Qualifications: 8+ years SRE experience, strong Python skills, and a passion for process optimisation.

The predicted salary is between 70000 - 90000 £ per year.

NOTE: VISA SPONSORSHIP IS NOT PROVIDED

Location: London, UK (5 Days / Week Onsite)

Type: Contract Inside IR35 / Permanent

Experience: Minimum 8+ Years

Skills:

SRE experience with Python-based applications (not Java)
Exposure to cloud technologies
Familiarity with Athena ecosystem or similar (SecDB, Quartz)
Banking and risk domain exposure

SRE Role description

We need an experienced SRE to focus predominantly on automation, optimization, and process re-engineering using AI for the Market Risk Platform. Success is measured by capacity created (toil eliminated, fewer manual steps, faster recovery, safer/faster changes) not by being the primary BAU support resources. Strong Python and provable agentic AI delivery.

Primary Objectives:

Eliminate Operational toil and recurring manual work through durable automation
Re-engineer support/change processes to reduce handoffs, approvals friction and rerun complexity
Industrialize reliability operations so existing SREs spend less time firefighting and more time engineering

Key Responsibilities (Automation & Process first):

Automation Engineering (Core)

Build production grade automation in Python (tools, services, workflows) to remove repetitive work: environment checks, dependency validation, automated reruns/reprocessing, safe restarts, drift detection, remediation actions, and standardized operation tasks
Create self-service capabilities for common requests (guard railed, auditable, repeatable)
Implement automation with Safety: idempotency, dry-run modes, approval gates where needed, rollback/undo strategies, and clear audit trails

Process Re-engineering (Core)

Map current operation processes (incident/problem/change, release readiness, rerun/recovery, access/entitlements, environment onboarding) and redesign them to remove waste and reduce cycle time
Standardize runbooks/playbooks into executable workflows, reduce tribal knowledge via templates, checklists, and automated pre-flight controls
Define and track operation KPIs (toil hours removed, alert volume reduction, MTTR improvements, change failure rate reduction, rerun time reduction)

Agentic AI

Design and implement agentic workflows that take action using tools/runbooks (e.g., diagnostics, evidence gathering, correlation, guided remediation, change-risk checks, automated rerun orchestration)
Put strong controls in place: scoped permissions, deterministic fallbacks, human-in-the-loop approvals for risky actions, evaluation harnesses and measurable outcomes
Productionize with monitoring, logging and post incident learnings feeding back into the agent/tooling

Observability (enablement for automation)

Required skills & Experience:

Senior SRE experience on distributed systems and batch/intraday workloads in a production environment
Strong Python
Provable agentic AI experience showing tool integration, guard rails, evaluation approach
Measurable impact (toil reduction, MTTR reduction, alert reduction etc)
Demonstrated process optimization ability (removing steps/handoffs, standardizing workflows, implementing light weight controls with metrics)
Strong Linux and troubleshooting fundamentals across application/system/network layers
Experience working across mixed estates (On Prem VMs + Cloud, with some Kubernetes exposure for operational monitoring/reruns)

Differentiators:

Exposure to Banking/Finance Market Risk Domains
Experience and knowledge of Athena eco system familiarity or similar (Sec DB Quartz)

Site Reliability Engineer in London employer: Neev Limited

As a Site Reliability Engineer in London, you will join a dynamic team dedicated to innovation and excellence in the banking and risk domain. Our company fosters a collaborative work culture that prioritises employee growth through continuous learning and development opportunities, while also offering competitive benefits and a focus on automation and process optimisation. With a commitment to reducing operational toil and enhancing efficiency, we provide a unique environment where your contributions directly impact the success of our Market Risk Platform.

Contact Details:

Neev Limited Recruitment Team

View Neev Limited profile

We think you need these skills to ace Site Reliability Engineer in London

Site Reliability Engineering (SRE)

Python

Cloud Technologies

Athena Ecosystem

Automation Engineering

Process Re-engineering

Agentic AI

Production Grade Automation

Monitoring and Logging

Linux

Troubleshooting

Distributed Systems

Batch Workloads

Kubernetes Exposure

Banking and Finance Domain Knowledge

Site Reliability Engineer in London

Neev Limited

Location: London

Apply Now

Site Reliability Engineer in London

At a Glance

Site Reliability Engineer in London employer: Neev Limited

We think you need these skills to ace Site Reliability Engineer in London

Company

Product

Help