Site Reliability Engineering Lead

Site Reliability Engineering Lead

Full-Time No home office possible
Go Premium
L

Are you passionate about building resilient systems and empowering teams to deliver reliable cloud solutions?

Do you thrive in designing and managing scalable platforms that keep services running smoothly?

About our team

The LexisNexis Intellectual Property (IP) division (lexisnexisip.com) provides international patent content and a suite of online and analytic tools that meet the evolving needs of the intellectual property market. We deliver data to support LexisNexis IP search and analytics applications, empowering our customers with actionable insights and metrics for critical business decisions.

Our corporate culture thrives on excellence, innovation, and a strong dedication to our customers, employees, and communities. Working here means joining a vibrant, diverse, and collaborative team where you are free to grow and contribute actively.

About the role

We are seeking a highly skilled and motivated SRE and Platform/Cloud Engineering Lead to lead a team responsible for ensuring the reliability, scalability, and resilience of mission‑critical systems for our IP business. This role is pivotal in managing a small team of senior engineers, driving operational excellence, and fostering a culture of continuous improvement.

You will collaborate closely with the central SRE organisation as well as work closely with IP product, development, architecture, and security teams to implement best practices in site reliability engineering, cloud platform management, and environment support for internal development and customer systems. The Lead will lead initiatives around incident response, disaster recovery, automation, monitoring, and FinOps cost optimisation, and customer support escalations.

Skills & Experience

  • Cloud Platforms & Services: Azure and AWS (EKS, EC2, S3, RDS, Lambda, Azure VMs, Functions).
  • Infrastructure as Code: Terraform, ARM/BICEP.
  • Monitoring & Observability: Datadog, Splunk, Coralogix, CloudWatch, Azure Monitor, along with an understanding of baseline metrics.
  • Programming Knowledge: Java, .NET/C#, SQL, React (for integration with supported products).
  • Systems & Networking: Linux/UNIX/Windows administration, networking, and security best practices.
  • Specialised Knowledge: Databricks, FinOps cost management, disaster recovery planning.
  • Core Competencies: Incident management, troubleshooting, IT service management frameworks, and GitOps/DevOps practices.

Soft Skills

  • Solid understanding of Site Reliability Engineering (SRE) principles and practices, including hands‑off experience with DevOps and Cloud Engineering.
  • Strong understanding of incident management, monitoring tools, IT service management frameworks and automation processes.
  • Previous experience in customer‑facing roles or managing customer support escalations.
  • Excellent technical problem‑solving and troubleshooting abilities.
  • Strong communication and interpersonal skills, with the ability to collaborate across teams.
  • Leadership skills with a track record of mentoring and guiding technical teams.
  • Strong collaboration and advanced communication skills at peer and senior management level.
  • Strong skills in setting, communicating, implementing, and achieving business objectives and goals through indirect leadership of and collaboration with others.
  • Strong organization, project planning, time management, and change management skills across multiple functional groups and departments, and strong delegation skills involving prioritizing and reprioritizing projects and managing projects of various sizes and complexity.
  • Advanced problem‑solving experience involving leading teams in identifying, researching, and coordinating the resources necessary to effectively troubleshoot/diagnose complex project issues; prior success extracting/translating findings into alternatives/solutions; and identifying risks/impacts and schedule adjustments to facilitate management decision‑making.
  • Ability to manage multiple priorities and work effectively in a fast‑paced environment.
  • Passion for continuous learning and staying up‑to‑date with industry trends and best practices.

Responsibilities

  • Building & Leading the SRE Organisation
  • Hire, mentor, and lead a team of SRE and platform engineers to ensure the timely and accurate performance of all team activities.
  • Foster a culture of reliability, blameless post‑mortems, and proactive incident prevention.
  • Define and implement SRE best practices for reliability, scalability, and performance.
  • Customer & Incident Management –
  • Manage intake, prioritization, and resolution of critical customer‑reported issues.
  • Act as an escalation point for high‑severity incidents and outages.
  • In collaboration with Product Support and Development Managers, ensure SLAs, performance benchmarks, and response protocols are met.
  • Live System Monitoring & Support
  • Design and maintain robust monitoring, alerting, and incident response systems.
  • In collaboration with Product Support Manager, lead incident management from detection to resolution and post‑incident analysis.
  • Ensure system high‑availability goals are met.
  • Oversee disaster recovery and business continuity planning within IP Technology organization.
  • Provide support for cloud resources management and workload capacity planning.
  • Drive automation to reduce manual intervention and improve efficiency.
  • Support product development teams with infrastructure, non‑functional requirements, and environment stability.
  • Manage Kubernetes deployments, Databricks environments, and other critical platforms.
  • Collaborate with cross‑functional teams to deliver secure, reliable, and cost‑effective platform and cloud solutions.
  • Ensuring all systems comply with security patching and vulnerability management tools.
  • In collaboration with architects, provide support for FinOps practices to monitor, optimise, and control cloud costs.
  • Provide clear direction, performance evaluations, and career growth for team members.
  • Ensure proper documentation, reporting, and compliance with security and regulatory standards.
  • Promote continuous learning, knowledge sharing, and operational excellence.
  • Writing and reviewing documentation for the management, improvement, and support of platforms/assets.
  • Completing complex bug fixes and root‑cause investigations.
  • Working closely with development and platform teams to understand requirements and translate them into high‑quality solutions.
  • Implementing infrastructure management and deployment best practices, including code/solution reviews.
  • Operating in various development environments (Agile, Waterfall, etc.) while collaborating with key stakeholders.

Why Join Us?

Join our team and contribute to a culture of innovation, collaboration, and excellence. If you are ready to advance your career and make a significant impact, we encourage you to apply.

Work in a way that works for you

We promote a healthy work/life balance across the organisation. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance and sabbaticals, we will help you meet your immediate responsibilities and your long‑term goals.

  • Working flexible hours – flexing the times when you work in the day to help you fit everything in and work when you are the most productive.

Working for you

We know that your well‑being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:

  • Dutch Share Purchase Plan
  • Annual Profit Share Bonus
  • Home, office or commuting allowance
  • Generous vacation entitlement and option for sabbatical leave
  • Maternity, Paternity, Adoption and Family Care leave
  • Personal Choice budget
  • Variety of online training courses and career roadshows
  • Wellbeing programs and gym facility in the office
  • Internal communities and networks
  • Recruitment introduction reward
  • Work from anywhere
  • Employee Assistance Program (global)
  • Annual Event

About the business

A global leader in information and analytics, we help researchers and healthcare professionals advance science and improve health outcomes for the benefit of society. Building on our publishing heritage, we combine quality information and vast data sets with analytics to support visionary science and research, health education and interactive learning, as well as exceptional healthcare and clinical practice. At Elsevier, your work contributes to the world’s grand challenges and a more sustainable future. We harness innovative technologies to support science and healthcare to partner for a better world.

Seniority level

Mid‑Senior level

Employment type

Full‑time

Job function

Other, General Business, and Information Technology

Industries

Software Development and Information Services

#J-18808-Ljbffr

L

Contact Detail:

LexisNexis Intellectual Property Solutions Recruiting Team

Site Reliability Engineering Lead
LexisNexis Intellectual Property Solutions
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

L
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>