Site Reliability Engineer in London

Site Reliability Engineer in London

London Full-Time 70000 - 90000 € / year (est.) No home office possible
C

At a Glance

  • Tasks: Automate and optimise processes using AI for the Market Risk Platform.
  • Company: Join a leading tech firm in London focused on innovation.
  • Benefits: Competitive salary, flexible working, and opportunities for professional growth.
  • Other info: Dynamic team environment with a focus on cutting-edge technology.
  • Why this job: Make a real impact by eliminating operational toil and enhancing efficiency.
  • Qualifications: Senior SRE experience, strong Python skills, and a knack for process optimisation.

The predicted salary is between 70000 - 90000 € per year.

We need an experienced SRE to focus predominantly on automation, optimization, and process re-engineering using AI for the Market Risk Platform. Success is measured by capacity created (toil eliminated, fewer manual steps, faster recovery, safer/faster changes) not by being the primary BAU support resources.

Primary Objectives:

  • Eliminate Operational toil and recurring manual work through durable automation
  • Re-engineer support/change processes to reduce handoffs, approvals friction and rerun complexity
  • Industrialize reliability operations so existing SREs spend less time firefighting and more time engineering

Key Responsibilities (Automation & Process first):

  • Automation Engineering (Core)
    • Build production grade automation in Python (tools, services, workflows) to remove repetitive work: environment checks, dependency validation, automated reruns/reprocessing, safe restarts, drift detection, remediation actions, and standardized operation tasks
    • Create self-service capabilities for common requests (guard railed, auditable, repeatable)
    • Implement “automation with Safety”: idempotency, dry-run modes, approval gates where needed, rollback/undo strategies, and clear audit trails
  • Process Re-engineering (Core)
    • Map current operation processes (incident/problem/change, release readiness, rerun/recovery, access/entitlements, environment onboarding) and redesign them to remove waste and reduce cycle time.
    • Standardize runbooks/playbooks into executable workflows, reduce tribal knowledge via templates, checklists, and automated pre-flight controls
    • Define and track operation KPIs (toil hours removed, alert volume reduction, MTTR improvements, change failure rate reduction, rerun time reduction).
  • Agentic AI
    • Design and implement agentic workflows that take action using tools/runbooks (e.g., diagnostics, evidence gathering, correlation, guided remediation, change-risk checks, automated rerun orchestration)
    • Put strong controls in place: scoped permissions, deterministic fallbacks, human-in-the-loop approvals for risky actions, evaluation harnesses and measurable outcomes.
    • Productionize with monitoring, logging and post incident learnings feeding back into the agent/tooling
  • Observability (enablement for automation)

Required skills & Experience:

  • Senior SRE experience on distributed systems and batch/intraday workloads in a production environment.
  • Strong Python
  • Provable agentic AI experience showing:
    • Tool integration, guard rails, evaluation approach
    • Measurable impact (toil reduction, MTTR reduction, alert reduction etc)
  • Demonstrated process optimization ability (removing steps/handoffs, standardizing workflows, implementing lightweight controls with metrics)
  • Strong Linux and troubleshooting fundamentals across application/system/network layers
  • Experience working across mixed estates (On Prem VMs + Cloud, with some Kubernetes exposure for operational monitoring/reruns)

Differentiators:

  • Exposure to Banking/Finance Market Risk Domains
  • Experience and knowledge of Athena ecosystem familiarity or similar (Sec DB Quartz)

Site Reliability Engineer in London employer: Cubestech Ltd

Join a forward-thinking company in London as a Site Reliability Engineer, where innovation meets collaboration. We prioritise employee growth through continuous learning opportunities and a supportive work culture that values automation and process optimisation. Enjoy the unique advantage of working in the dynamic banking and finance sector, contributing to impactful projects while benefiting from a flexible environment that fosters creativity and efficiency.

C

Contact Detail:

Cubestech Ltd Recruiting Team

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineer in London

Tip Number 1

Network like a pro! Attend meetups, conferences, or online webinars related to Site Reliability Engineering. Engaging with industry professionals can open doors and give us insider info on job openings that might not be advertised.

Tip Number 2

Show off your skills! Create a portfolio showcasing your automation projects in Python or any agentic AI work you've done. This gives potential employers a tangible look at what we can bring to the table.

Tip Number 3

Prepare for interviews by practising common SRE scenarios. Think about how you would tackle operational toil or process re-engineering. We want to demonstrate our problem-solving skills and how we can optimise processes effectively.

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive about their job search!

We think you need these skills to ace Site Reliability Engineer in London

Automation Engineering
Python
Agentic AI
Process Re-engineering
Operational Toil Reduction
KPI Tracking
Observability

Some tips for your application 🫡

Tailor Your CV:Make sure your CV is tailored to the SRE role. Highlight your experience with automation, Python, and any relevant AI projects. We want to see how your skills align with our needs!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you're passionate about the SRE role and how you can help us eliminate operational toil. Keep it concise but impactful.

Showcase Your Achievements:When detailing your experience, focus on measurable outcomes. Did you reduce MTTR or improve efficiency? We love numbers that demonstrate your impact, so don’t hold back!

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy!

How to prepare for a job interview at Cubestech Ltd

Know Your Automation Inside Out

Make sure you can discuss your experience with automation in detail. Be ready to share specific examples of how you've built production-grade automation in Python, and how it has eliminated operational toil. Highlight any measurable impacts you've achieved, like reduced MTTR or alert volume.

Showcase Your Process Re-engineering Skills

Prepare to talk about your approach to process optimisation. Think of instances where you've mapped out current operations and redesigned them for efficiency. Bring along examples of standardised workflows or runbooks you've created that have made a real difference.

Demonstrate Your Agentic AI Experience

Be ready to explain your experience with agentic AI and how you've implemented workflows that take action autonomously. Discuss the tools you've integrated and the controls you've put in place, such as human-in-the-loop approvals and measurable outcomes.

Familiarise Yourself with the Banking/Finance Sector

If you have experience in the banking or finance market risk domains, make sure to highlight it. If not, do some research on the sector and be prepared to discuss how your skills can translate into this environment, especially regarding distributed systems and mixed estates.