Job Board

Companies

The Boston Consulting Group GmbH

Principal Site Reliability Engineering Expert Director

Principal Site Reliability Engineering Expert Director in London

London Full-Time 100000 - 150000 £ / year (est.) Home office (partial)

Apply now

At a Glance

Tasks: Lead the design of reliable, automated systems across diverse tech domains.
Company: Join Boston Consulting Group, a pioneer in business strategy and transformation.
Benefits: Enjoy a hybrid work model, competitive salary, and opportunities for professional growth.
Other info: Be part of a diverse team driving innovation and operational excellence.
Why this job: Make a real impact by shaping automation and reliability in a collaborative environment.
Qualifications: 10+ years in Site Reliability Engineering with strong technical skills.

The predicted salary is between 100000 - 150000 £ per year.

Who We Are

Boston Consulting Group partners with leaders in business and society to tackle their most important challenges and capture their greatest opportunities. BCG was the pioneer in business strategy when it was founded in 1963. Today, we help clients with total transformation-inspiring complex change, enabling organizations to grow, building competitive advantage, and driving bottom-line impact. To succeed, organizations must blend digital and human capabilities. Our diverse, global teams bring deep industry and functional expertise and a range of perspectives to spark change. BCG delivers solutions through leading-edge management consulting along with technology and design, corporate and digital ventures—and business purpose. We work in a uniquely collaborative model across the firm and throughout all levels of the client organization, generating results that allow our clients to thrive.

What You'll Do

The Principal Site Reliability Engineer (SRE) is a senior technical leader responsible for shaping how reliability, automation, and operational excellence are engineered across the organisation. Operating across domains including traditional infrastructure, cloud engineering, network operations, identity, observability, security, AI-driven operations, and automated data workflows, the role focuses on designing scalable systems, reusable engineering patterns, and standardised controls that reduce operational toil, improve resilience, and embed reliability, governance, and compliance directly into delivery pipelines and operational platforms. This role will drive organisational change towards automation-first, measurable, and repeatable practices.

A key part of the role is building and evolving reusable CI/CD and Terraform modules, engineering guardrails, observability patterns, and automation frameworks that can be adopted across multiple teams and domains without requiring each team to solve the same problems independently. The Principal SRE also plays an important enablement role beyond deeply technical teams, helping less technical areas of the business adopt structured, governed, and scalable ways of working. This includes translating complex engineering practices into practical standards, improving how governance is implemented through engineering controls rather than manual oversight, and driving operational maturity across a broad and diverse technology landscape.

The ideal candidate is a systems thinker who understands how services, networks, identity, data flows, and operational processes fail in real-world conditions, and can apply that understanding to build automation-first, reliability-focused operating models that scale across both technical and non-technical functions.

Key Responsibilities

Cross-Domain Reliability Engineering: Design and evolve reliability patterns across cloud, network, identity, and security domains. Identify systemic risks and failure modes across platforms and services, and define engineering solutions to mitigate them. Ensure operational activities are embedded into delivery models through automation, CI/CD integration, and event-driven workflows.
Automation & Toil Reduction at Scale: Lead the design of automation frameworks that eliminate manual operational tasks across multiple domains. Translate incident learnings and operational inefficiencies into scalable automation and preventative controls. Drive adoption of automation-first principles, reducing dependency on human-driven processes. Contribute to AI-driven operational use cases, including event correlation, anomaly detection, noise reduction, operational insights, and automated remediation. Ensure AIOps capabilities are grounded in reliable telemetry, clear control boundaries, and measurable operational outcomes.
Observability & 24/7 Operational Excellence: Define standards for telemetry, monitoring, alerting, and operational visibility across all critical systems. Ensure services are observable, measurable, and support proactive detection of issues. Improve operational readiness, incident response effectiveness, and time-to-recovery through engineering solutions.
CI/CD & Platform Integration: Contribute to the design of CI/CD patterns that embed reliability, security, and operational controls into pipelines. Ensure infrastructure, network, identity, and security configurations are managed through code and validated automatically. Support integration of platform services into delivery pipelines to enable consistent, repeatable deployments.
Security & Identity Integration: Contribute to secure‑by‑design patterns, including least privilege, identity‑based access, and short‑lived credentials. Support integration of security controls (e.g. secrets management, authentication, policy enforcement) into engineering workflows. Ensure security and compliance requirements are met through engineering controls rather than manual processes.
Network & Infrastructure Reliability: Support the design of resilient network architectures and segmentation aligned with Zero Trust principles. Ensure network configurations and controls are automated, validated, and observable. Contribute to infrastructure design patterns that improve availability, scalability, and fault tolerance. Design and improve operational patterns for network reliability, segmentation, visibility, and change validation. Support automation and standardisation of network controls and operational procedures to reduce manual intervention and configuration drift.
Technical Leadership & Enablement: Provide technical leadership across teams, influencing standards, architecture, and engineering practices. Mentor engineers on reliability engineering, automation, and systems thinking. Drive consistency through reusable patterns, frameworks, and documentation.
Strategic Influence & Continuous Improvement: Contribute to reliability engineering strategy and roadmap across the organisation. Communicate technical concepts, risks, and recommendations to senior stakeholders and leadership. Lead initiatives that improve reliability maturity, engineering efficiency, and operational scalability. Support less technical teams and functions in adopting structured, automated, and measurable operational practices. Act as a bridge between engineering capability and organisational change, helping scale good practice beyond core platform teams.
Automated Data Workflows: Design and improve automated data workflows that support operational reporting, observability, governance, and decision‑making. Ensure operational data pipelines are reliable, timely, and aligned to engineering and business needs.
Reusable Engineering Frameworks: Build and evolve reusable modules, patterns, and frameworks for CI/CD, Terraform, and operational automation. Embed governance, validation, and reliability controls into these shared engineering assets by default.
Governance by Engineering: Translate governance requirements into practical engineering controls, automated checks, and repeatable standards. Help teams adopt compliant and supportable operating models without relying on manual policing or process‑heavy interventions.

What You'll Bring

Required Qualifications: 10+ years of experience in Site Reliability Engineering, Platform Engineering, or related fields. Strong hands‑on experience across multiple domains, including:

Cloud platforms (AWS, Azure)
CI/CD and Infrastructure‑as‑Code (e.g. Terraform)
Observability tools (e.g. Datadog, Splunk)
Automation and scripting (e.g. Python)

Experience designing and implementing scalable automation and reliability solutions. Deep understanding of distributed systems, failure modes, and resilience patterns. Experience integrating operational and security controls into engineering workflows. Strong stakeholder engagement and technical communication skills.

Preferred Qualifications: Experience with identity and access management systems (e.g. Entra ID, Vault). Experience with network architecture and security controls (e.g. firewalls, segmentation). Familiarity with Zero Trust principles and security engineering practices. Experience working in large, federated organisations with diverse technology stacks. Exposure to compliance and regulatory requirements (e.g. PCI, HIPAA, SOX).

Additional Info

Hybrid or on‑site work model. Operates as a senior individual contributor with broad cross‑organisational influence. Expected to balance hands‑on technical leadership with strategic direction. Occasional travel may be required for team or stakeholder engagement.

Principal Site Reliability Engineering Expert Director in London employer: The Boston Consulting Group GmbH

At Boston Consulting Group, we pride ourselves on fostering a dynamic and inclusive work environment that empowers our employees to thrive. As a Principal Site Reliability Engineering Expert Director, you will benefit from our commitment to professional growth through mentorship, innovative projects, and a collaborative culture that values diverse perspectives. Located in a vibrant city, BCG offers a hybrid work model, competitive benefits, and the opportunity to make a significant impact on both our clients and the broader community.

Contact Detail:

The Boston Consulting Group GmbH Recruiting Team

View The Boston Consulting Group GmbH Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Principal Site Reliability Engineering Expert Director in London

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with BCG employees on LinkedIn. A personal touch can make all the difference when it comes to landing that interview.

✨Tip Number 2

Show off your skills! Prepare a portfolio or case studies that highlight your experience in Site Reliability Engineering. When you get the chance to chat with recruiters or hiring managers, let your work speak for itself.

✨Tip Number 3

Practice makes perfect! Get ready for those technical interviews by brushing up on your knowledge of cloud platforms, CI/CD, and automation tools. Mock interviews with friends or mentors can help you feel more confident.

✨Tip Number 4

Apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows you're genuinely interested in joining the BCG team. Don’t miss out on this opportunity!

We think you need these skills to ace Principal Site Reliability Engineering Expert Director in London

Site Reliability Engineering

Platform Engineering

Cloud Platforms (AWS, Azure)

CI/CD and Infrastructure-as-Code (Terraform)

Observability Tools (Datadog, Splunk)

Automation and Scripting (Python)

Distributed Systems

Failure Modes and Resilience Patterns

Operational and Security Controls Integration

Stakeholder Engagement

Technical Communication

Identity and Access Management Systems (Entra ID, Vault)

Network Architecture and Security Controls

Zero Trust Principles

Compliance and Regulatory Requirements (PCI, HIPAA, SOX)

Some tips for your application 🫡

Tailor Your Application: Make sure to customise your CV and cover letter to highlight your experience in Site Reliability Engineering. Use keywords from the job description to show that you understand what we're looking for.

Showcase Your Technical Skills: Don’t hold back on detailing your hands-on experience with cloud platforms, CI/CD, and automation tools. We want to see how you've applied these skills in real-world scenarios, so give us some juicy examples!

Communicate Clearly: When writing your application, keep it clear and concise. We appreciate straightforward communication, especially when it comes to complex technical concepts. Make it easy for us to see your thought process.

Apply Through Our Website: We encourage you to submit your application through our website. It’s the best way to ensure it gets into the right hands and helps us track your application efficiently. Plus, it’s super easy!

How to prepare for a job interview at The Boston Consulting Group GmbH

✨Know Your Stuff

Make sure you have a solid grasp of Site Reliability Engineering principles, especially around automation and operational excellence. Brush up on your knowledge of cloud platforms like AWS or Azure, CI/CD practices, and observability tools. Being able to discuss these topics confidently will show that you're the right fit for the role.

✨Showcase Your Experience

Prepare specific examples from your past work that demonstrate your ability to design scalable systems and implement automation frameworks. Highlight any projects where you've reduced operational toil or improved resilience. This will help interviewers see how your experience aligns with their needs.

✨Communicate Clearly

Since this role involves influencing various teams, practice explaining complex technical concepts in simple terms. Be ready to discuss how you've helped less technical areas adopt structured and scalable ways of working. Clear communication can set you apart from other candidates.

✨Ask Insightful Questions

Prepare thoughtful questions about the company's approach to reliability engineering and how they envision the role evolving. This shows your genuine interest in the position and helps you gauge if the company culture aligns with your values and work style.

Principal Site Reliability Engineering Expert Director in London

The Boston Consulting Group GmbH

Location: London

Apply now

Principal Site Reliability Engineering Expert Director in London

At a Glance

Principal Site Reliability Engineering Expert Director in London employer: The Boston Consulting Group GmbH

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Principal Site Reliability Engineering Expert Director in London

Some tips for your application 🫡

How to prepare for a job interview at The Boston Consulting Group GmbH

Principal Site Reliability Engineering Expert Director in London

Land your dream job quicker with Premium