Site Reliability Engineering Manager in Cardiff

Site Reliability Engineering Manager in Cardiff

Cardiff Full-Time 70000 - 90000 £ / year (est.) No working from home possible
LexisNexis Risk Solutions

At a Glance

  • Tasks: Lead a team to enhance platform reliability and incident response for critical services.
  • Company: Join LexisNexis Risk Solutions, a leader in risk assessment solutions.
  • Benefits: Competitive salary, career development, and a supportive work culture.
  • Other info: Dynamic role with opportunities for growth and innovation.
  • Why this job: Make a real impact on service reliability while fostering a collaborative environment.
  • Qualifications: Experience in leading engineers and strong knowledge of SRE practices required.

The predicted salary is between 70000 - 90000 £ per year.

Are you excited to lead a team that improves platform reliability, resilience, and incident response for critical services? Do you enjoy building a supportive on-call culture while driving automation and secure‑by‑default operations?

About the Business

LexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on helping businesses of all sizes drive higher revenue growth, maximize operational efficiencies, and improve customer experience. Our solutions help our customers solve difficult problems in the areas of Anti‑Money Laundering/Counter Terrorist Financing, Identity Authentication & Verification, Fraud and Credit Risk mitigation and Customer Data Management.

About our Team

As a Site Reliability Engineering Lead, you’ll lead and partner with cross‑functional teams to keep our platforms reliable, resilient, secure, and continuously improving — if you’re passionate about operational excellence and helping others succeed, we’d love to hear from you.

About the Role

In this role, you will lead a team of Site Reliability Engineers focused on improving the reliability, resilience, and operational readiness of the platforms your team supports. You’ll partner closely with engineering, product, and security teams to reduce operational risk, strengthen incident response, and drive meaningful automation that improves service health and customer outcomes.

Responsibilities

  • Lead and develop a team of SREs — set direction, manage conflicting priorities and trade‑offs, remove blockers, support wellbeing on‑call, and keep work focused on the highest reliability risks and opportunities.
  • People management: hire and onboard talent, provide regular coaching and feedback, support career development, and contribute to performance and progression processes.
  • Own service reliability for the platforms your team supports: define and evolve operation metrics, uphold standards for observability, monitoring, alerting, and operational readiness.
  • Work closely with Security and Engineering to embed secure‑by‑default operations (e.g., patching, access controls, secrets management) and support audit and compliance needs.
  • Participate in the on‑call rota (including escalation/incident leadership as needed) and continuously improve runbooks, alerts, and operational readiness.
  • Act as a senior escalation point during incidents, providing calm, structured coordination to restore service quickly and safely, and ensuring clear stakeholder communications.
  • Lead blameless post‑incident reviews and Root Cause Analyses (RCAs), ensuring actions are prioritised, tracked, and shared across teams.
  • Partner with product and engineering teams to design for resilience, capacity, and recovery — systems that fail gracefully, recover quickly, and meet customer reliability expectations; drive automation and reduce toil by improving platform tooling, CI/CD, standards, and self‑service capabilities.

Requirements

  • Experience leading, mentoring, or managing engineers.
  • Strong grasp of SRE/platform engineering practices, including Infrastructure‑as‑code, observability, incident management, on‑call operations, and post‑incident reviews.
  • Confidence working with cloud platforms, and a pragmatic approach to automation and reducing operational toil.
  • Clear, structured communication with both engineers and stakeholders, especially when handling operational risk or coordinating incidents.
  • A collaborative, learning‑focused approach that builds psychological safety and values curiosity over blame.

Site Reliability Engineering Manager in Cardiff employer: LexisNexis Risk Solutions

At LexisNexis Risk Solutions, we pride ourselves on fostering a dynamic and inclusive work environment where innovation thrives. As a Site Reliability Engineering Manager, you will not only lead a talented team dedicated to enhancing platform reliability but also benefit from our commitment to employee growth through continuous learning opportunities and a supportive culture that prioritises wellbeing. Located in a vibrant area, our company offers unique advantages such as flexible working arrangements and a collaborative atmosphere that empowers you to make a meaningful impact in the risk assessment industry.

LexisNexis Risk Solutions

Contact Details:

LexisNexis Risk Solutions Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineering Manager in Cardiff

Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with current employees at LexisNexis Risk Solutions. A friendly chat can open doors that a CV just can't.

Tip Number 2

Prepare for those interviews by brushing up on your SRE knowledge. Be ready to discuss your experience with incident management and automation. Show us how you’ve made platforms more reliable in the past!

Tip Number 3

Don’t forget to showcase your leadership skills! Talk about how you've mentored teams or improved on-call cultures. We want to see how you can inspire others to achieve operational excellence.

Tip Number 4

Apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows us you’re genuinely interested in being part of our team at LexisNexis Risk Solutions.

We think you need these skills to ace Site Reliability Engineering Manager in Cardiff

Team Leadership
Site Reliability Engineering (SRE)
Incident Management
Operational Readiness
Automation
Infrastructure-as-Code
Observability

Some tips for your application 🫡

Tailor Your CV:Make sure your CV reflects the skills and experiences that align with the Site Reliability Engineering Manager role. Highlight your leadership experience, technical skills, and any relevant projects that showcase your ability to improve platform reliability and resilience.

Craft a Compelling Cover Letter:Use your cover letter to tell us why you're passionate about operational excellence and how you can contribute to our team. Share specific examples of how you've led teams or improved processes in previous roles, and don’t forget to mention your approach to building a supportive on-call culture.

Showcase Your Communication Skills:Since clear communication is key in this role, make sure your application materials reflect your ability to communicate effectively with both technical and non-technical stakeholders. Use structured language and be concise, especially when discussing complex topics like incident management or automation.

Apply Through Our Website:We encourage you to apply directly through our website for the best chance of getting noticed. It’s the easiest way for us to track your application and ensure it reaches the right people. Plus, you’ll get to see more about our company culture and values!

How to prepare for a job interview at LexisNexis Risk Solutions

Know Your SRE Fundamentals

Make sure you brush up on your Site Reliability Engineering principles. Understand key concepts like Infrastructure-as-Code, observability, and incident management. Being able to discuss these topics confidently will show that you're not just familiar with the role but also passionate about it.

Showcase Your Leadership Skills

As a Site Reliability Engineering Manager, you'll be leading a team. Prepare examples of how you've successfully managed teams in the past, focusing on conflict resolution, mentoring, and fostering a supportive culture. Highlighting your people management skills will set you apart.

Prepare for Incident Scenarios

Expect questions around incident response and post-incident reviews. Think of specific incidents you've handled, what actions you took, and how you communicated with stakeholders. This will demonstrate your calmness under pressure and structured approach to problem-solving.

Emphasise Collaboration and Communication

The role requires working closely with cross-functional teams. Be ready to discuss how you've collaborated with engineering, product, and security teams in the past. Clear communication is key, so share examples that highlight your ability to convey complex information effectively.