Infrastructure Reliability Engineering, Senior Manager in London
Infrastructure Reliability Engineering, Senior Manager

Infrastructure Reliability Engineering, Senior Manager in London

London Full-Time 72000 - 108000 ÂŁ / year (est.) No home office possible
London Metal Exchange

At a Glance

  • Tasks: Lead the Infrastructure Reliability Engineering team to ensure system resilience and operational excellence.
  • Company: Join the London Metal Exchange, a global leader in industrial metals trading.
  • Benefits: Enjoy a competitive salary, flexible working hours, and opportunities for professional growth.
  • Why this job: Make a real impact on critical trading operations while shaping the future of technology.
  • Qualifications: 10+ years in infrastructure or reliability engineering with strong leadership skills.
  • Other info: Be part of a diverse team committed to innovation and excellence.

The predicted salary is between 72000 - 108000 ÂŁ per year.

The London Metal Exchange is the world centre for industrial metals trading. Most of the world's global non-ferrous futures business is conducted on the LME's three trading platforms totalling $18 trillion, 178 million lots and 4 billion tonnes with a market open interest high of 1.8 million lots in 2024. All trades are cleared and settled by LME Clear.

Overall Purpose of Role

This role is accountable for initially establishing then maturing a best of breed Infrastructure Reliability Engineering (IRE) function, embedding reliability engineering as a core discipline across the technology lifecycle, from design through live operation, in support of trading critical and regulatory significant services. To provide senior leadership across Infrastructure Reliability Engineering, accountable for the resilience, availability, and operational readiness of the LME Group technology estate. Lead the design and delivery of complex infrastructure transformation, platform modernisation, and re‐architecture initiatives, ensuring secure, compliant, and highly reliable services that support trading critical operations and regulatory obligations.

Responsibilities

  • Establish, mature, and continuously evolve the Infrastructure Reliability Engineering function, defining the IRE operating model, engagement patterns, and service boundaries across infrastructure, architecture, operations, security, and application teams.
  • Set, maintain, and enforce consistent reliability engineering standards, patterns, and tooling across the infrastructure estate, balancing resilience, regulatory assurance, and operational efficiency.
  • Act as senior Infrastructure Reliability Engineering SME across major programmes end‐to‐end (discovery, dependency mapping, design, planning, build, cutover, fall‐back), with direct accountability for service stability and risk reduction for trading‐critical platforms.
  • Drive a proactive reliability and failure engineering culture, including structured risk identification, resilience testing, failover validation, and scenario based exercises for trading critical and systemically important services.
  • Act as the accountable owner for Infrastructure Operational Readiness, ensuring platforms and services do not transition into live operation without meeting mandated readiness, observability, recoverability, and supportability criteria.
  • Define and embed a consistent reliability measurement framework across infrastructure platforms, including service level indicators, objectives, and leading indicators of operational risk, enabling data driven prioritisation and informed investment decisions.
  • Build, lead, and develop a high performing Infrastructure Reliability Engineering team, defining clear role expectations, capability standards, and development pathways.
  • Foster a culture of engineering excellence, shared ownership, and continuous improvement, ensuring operational knowledge and resilience capability are institutionalised and not dependent on individuals.
  • Act as a senior authority on infrastructure resilience and operational risk, influencing strategic decisions, architectural direction, and investment priorities to ensure reliability is designed in, not retrofitted.
  • Own measurable infrastructure reliability outcomes, including availability, resilience, recovery performance, and operational risk reduction, with regular executive level reporting against agreed targets.
  • Own and enforce reliability governance, including stage gates, design authorities, risk and issue management, CAB/change control, and auditable documentation aligned to ITSM, IBS, and regulatory expectations.
  • Lead platform modernisation and resilience engineering initiatives, including containerisation and cloud‐adjacent platforms (e.g. Kubernetes, OpenShift), working closely with Architecture, InfoSec, and application teams to embed reliability, security, and observability by design.
  • Define and drive the LME Infrastructure Reliability posture, including fault tolerance, redundancy, capacity planning, disaster recovery, and failover strategies across on‐prem and hybrid environments.
  • Lead senior‐level technical discovery and design workshops to shape scope, delivery approach, and resourcing for reliability‐critical initiatives, ensuring alignment with IOE priorities and business outcomes.
  • Establish and assure Operational Readiness (ORR) standards: runbooks, monitoring and alerting, SLIs/SLOs, performance and capacity baselines, service transition, and operational handover.
  • Ensure infrastructure platforms meet security and compliance requirements (e.g. CIS, ISO 27001, NIST), covering identity and access management, encryption, auditability, and regulatory evidence.
  • Engage at senior stakeholder level across Technology and the business, providing clear communication on delivery status, operational risk, dependencies, cost forecasts, and resource demand.

Academic and Professional Qualifications Required

  • Bachelor's degree in Computer Science, Engineering, Information Technology, or a closely related discipline.
  • Demonstrable track record of continuous professional development in infrastructure, solutions engineering, or technology transformation.

Required Knowledge and Level of Experience

  • 10+ years of experience leading large-scale Infrastructure or Reliability Engineering functions, with demonstrable accountability for the availability, resilience, and operational performance of mission‐critical systems.
  • Proven experience establishing, scaling, or materially maturing an Infrastructure Reliability, Platform Reliability, or equivalent function within a complex, regulated, or high‐availability environment.
  • Significant experience operating in regulated or high‐assurance environments (e.g. financial services, exchanges, clearing, or equivalent).
  • Experience influencing senior leadership and steering complex transformation initiatives across multiple technology domains.
  • Significant experience leading or assuring large‐scale, enterprise Linux estates (e.g. RHEL‐based), including responsibility for reliability, resilience, and operational risk in regulated or high‐availability environments.

Skills set and Core Competencies Required for the Role

  • Deep expertise in infrastructure reliability engineering, resilience patterns, and operational risk management.
  • Strong governance, assurance, and regulatory mindset.
  • Excellent stakeholder engagement and senior communication skills.
  • Ability to lead multi‐disciplinary technical teams through complex change.
  • Data‐driven approach to reliability, performance, and continuous improvement.
  • Reliability engineering, resilience patterns, and operational risk management.
  • Governance, assurance, and regulatory mindset.
  • Data‐driven analysis and decision-making.
  • Senior stakeholder influence and technical authority.
  • Team leadership and capability development.

Technical Skills – Infrastructure Reliability Engineering

  • Enterprise Linux / RHEL mastery.
  • Linux reliability, performance, and capacity engineering.
  • Automation, standardised builds, configuration management.
  • Observability, diagnostics, and root‐cause analysis.
  • Linux host reliability for container / OpenShift platforms.
  • Linux security, hardening, and compliance.
  • Linux‐level failure engineering and resilience patterns.
  • Senior Linux technical authority.

Personal Qualities

  • High integrity, ownership, and accountability in all aspects of work.
  • Structured, pragmatic, and calm under pressure. Able to manage competing priorities and deliver in high‐stakes environments.
  • Collaborative and inclusive, building strong cross‐functional relationships and fostering a culture of open communication.
  • Curious and improvement‐oriented, always seeking to challenge the status quo and drive innovation with data‐driven insights.
  • Adaptable and resilient, able to navigate ambiguity and lead teams through complex change.
  • Commitment to diversity, equity, and inclusion, respecting and valuing the unique contributions of all colleagues.
  • Comfortable holding the line on operational risk and readiness in high‐pressure, time‐sensitive delivery environments.

The LME is committed to creating a diverse environment and is proud to be an equal opportunity employer. In recruiting for our teams, we welcome the unique contributions that you can bring in terms of education, ethnicity, race, sex, gender identity, expression and reassignment, nation of origin, age, languages spoken, colour, religion, disability, sexual orientation and beliefs. In doing so, we want every LME employee to feel our commitment to showing respect for all and encouraging open collaboration and communication.

Infrastructure Reliability Engineering, Senior Manager in London employer: London Metal Exchange

The London Metal Exchange offers a dynamic and inclusive work environment in the heart of London, where employees are empowered to drive innovation in infrastructure reliability engineering. With a strong commitment to professional development, team collaboration, and a culture that values diversity, LME provides unique opportunities for growth and leadership in a high-stakes, regulated industry. Join us to be part of a forward-thinking organisation that prioritises operational excellence and resilience in critical trading operations.
London Metal Exchange

Contact Detail:

London Metal Exchange Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Infrastructure Reliability Engineering, Senior Manager in London

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.

✨Tip Number 2

Prepare for interviews by researching the company and its culture. Understand their values and how your experience aligns with their goals. This will help you stand out and show that you're genuinely interested in the role.

✨Tip Number 3

Practice your responses to common interview questions, but keep it natural. Use the STAR method (Situation, Task, Action, Result) to structure your answers and highlight your achievements effectively.

✨Tip Number 4

Don’t forget to follow up after your interviews! A simple thank-you email can go a long way in keeping you top of mind. Plus, it shows your enthusiasm for the position. And remember, apply through our website for the best chance!

We think you need these skills to ace Infrastructure Reliability Engineering, Senior Manager in London

Infrastructure Reliability Engineering
Resilience Engineering
Operational Risk Management
Governance and Assurance
Stakeholder Engagement
Data-Driven Decision Making
Team Leadership
Linux Mastery (RHEL)
Automation and Configuration Management
Observability and Diagnostics
Root-Cause Analysis
Containerisation (OpenShift, Kubernetes)
Security and Compliance Standards (CIS, ISO 27001, NIST)
Capacity Planning
Disaster Recovery Strategies

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the Infrastructure Reliability Engineering role. Highlight your experience in leading large-scale infrastructure projects and any relevant achievements that showcase your expertise in resilience and operational performance.

Craft a Compelling Cover Letter: Your cover letter should tell us why you're the perfect fit for this role. Share specific examples of how you've established or matured reliability engineering functions in the past, and don’t forget to mention your leadership style and how you foster team collaboration.

Showcase Your Technical Skills: We want to see your technical prowess! Be sure to include your experience with enterprise Linux, automation, and any cloud technologies you've worked with. Mention specific tools or frameworks you've used to enhance reliability and performance.

Apply Through Our Website: Don’t forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on joining our team at the London Metal Exchange!

How to prepare for a job interview at London Metal Exchange

✨Know Your Stuff

Make sure you brush up on your knowledge of infrastructure reliability engineering. Understand the key concepts, tools, and methodologies that are relevant to the role. Be ready to discuss your experience with Linux systems, resilience patterns, and operational risk management.

✨Showcase Leadership Skills

As a Senior Manager, you'll need to demonstrate your ability to lead multi-disciplinary teams. Prepare examples of how you've successfully managed teams through complex changes or transformations in the past. Highlight your experience in influencing senior leadership and driving initiatives.

✨Prepare for Scenario Questions

Expect scenario-based questions that assess your problem-solving skills and decision-making under pressure. Think about past experiences where you had to manage operational risks or ensure service stability, and be ready to explain your thought process and actions taken.

✨Engage with Stakeholders

Communication is key in this role. Be prepared to discuss how you've engaged with stakeholders at various levels. Share examples of how you've communicated complex technical information clearly and effectively, ensuring alignment with business outcomes.

Infrastructure Reliability Engineering, Senior Manager in London
London Metal Exchange
Location: London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>