Site Reliability Engineer in London
Site Reliability Engineer

Site Reliability Engineer in London

London Full-Time 55000 - 65000 £ / year (est.) No home office possible
CGI

At a Glance

  • Tasks: Support and enhance critical data platforms, ensuring reliability and operational excellence.
  • Company: Join CGI, a leading IT and business consulting firm with a collaborative culture.
  • Benefits: Competitive salary, health benefits, and opportunities for professional growth.
  • Other info: Dynamic environment with a focus on teamwork, ownership, and continuous improvement.
  • Why this job: Make a real impact on data-driven platforms while working with cutting-edge technologies.
  • Qualifications: Experience in Site Reliability Engineering and strong skills in Kubernetes and the ELK stack.

The predicted salary is between 55000 - 65000 £ per year.

We are seeking an experienced and proactive Site Reliability Engineer (SRE) to join a team supporting multiple data product and platform groups. This role is focused on improving the reliability, scalability, observability, and operational performance of critical data-driven platforms and services across complex production environments. The successful candidate will work closely with engineering, platform, and support teams to strengthen monitoring and alerting capabilities, improve logging and traceability, troubleshoot production incidents, support deployments, and automate operational processes wherever possible. The environment includes Kubernetes, Helm, the ELK stack, and a strong focus on modern Site Reliability Engineering practices across cloud and platform services.

This is a hands-on technical role suited to someone who thrives in fast-paced operational environments and is passionate about reliability engineering, automation, and continuous improvement. The role requires strong collaboration with both client stakeholders and engineering teams to ensure platform stability, operational excellence, and high service availability.

Your future duties and responsibilities:

  • Support, maintain, and improve highly available production platforms and services across cloud and containerised environments.
  • Manage and support Kubernetes clusters and Helm-based deployments across multiple environments.
  • Implement and enhance monitoring, alerting, logging, and observability solutions to improve platform reliability and operational visibility.
  • Investigate incidents, analyse logs, identify root causes, and drive timely resolution of production issues.
  • Participate in incident response, post-incident reviews, and continuous operational improvement initiatives.
  • Automate operational tasks and repetitive support activities to reduce manual effort and improve platform efficiency.
  • Work closely with engineering and data platform teams to improve system resilience, scalability, deployment reliability, and operational maturity.
  • Develop and maintain operational documentation, support procedures, runbooks, and troubleshooting guides.
  • Contribute to reliability engineering practices including proactive monitoring, service health management, and operational readiness.
  • Support deployment activities, release processes, and production change management activities.

Required qualifications to be successful in this role:

  • Strong commercial experience in Site Reliability Engineering, Platform Engineering, DevOps, or Production Support environments.
  • Strong hands-on experience with Kubernetes and Helm in enterprise or production environments.
  • Proven experience supporting mission-critical production platforms and operational support functions.
  • Strong hands-on experience with the ELK stack (Elasticsearch, Logstash, Kibana) for logging, monitoring, troubleshooting, and operational analysis.
  • Demonstrated capability in log analysis, incident investigation, troubleshooting, and root cause analysis.
  • Strong understanding and practical experience with core SRE practices including monitoring and alerting, incident management and response, root cause analysis and post-incident reviews, automation and operational improvement, production support and reliability engineering.
  • Experience working with data platforms, analytics platforms, or data product teams would be highly advantageous.
  • Experience with scripting and automation tools such as Bash, Python, or similar technologies is desirable.
  • Exposure to CI/CD pipelines, Infrastructure as Code, and cloud-native environments would be beneficial.
  • Strong communication, stakeholder engagement, and collaboration skills.
  • Ability to work effectively in fast-paced support environments and manage competing priorities under pressure.

Security Clearance: Resource must be willing and able to work onsite at the client location five days per week. Candidate must already hold current HLC clearance (mandatory requirement). Previous experience working within secure, government, defence, or highly regulated environments will be highly regarded. Due to client security requirements, only candidates meeting the required clearance criteria will be considered.

What you can expect from us:

Together, as owners, let’s turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you’ll reach your full potential because you are invited to be an owner from day 1 as we work together to bring our Dream to life. That’s why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company’s strategy and direction.

Your work creates value. You’ll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise.

You’ll shape your career by joining a company built to grow and last. You’ll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons. Come join our team—one of the largest IT and business consulting services firms in the world.

Site Reliability Engineer in London employer: CGI

At CGI, we pride ourselves on fostering a culture of ownership, collaboration, and continuous improvement, making us an exceptional employer for Site Reliability Engineers. Our commitment to employee growth is evident through our supportive leadership and access to global resources, allowing you to develop innovative solutions while working in a dynamic environment. Located in a secure setting, you'll thrive alongside passionate professionals dedicated to operational excellence and high service availability.
CGI

Contact Detail:

CGI Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer in London

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with potential colleagues on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those involving Kubernetes, Helm, or the ELK stack. This gives employers a tangible look at what you can do and sets you apart from the crowd.

✨Tip Number 3

Prepare for interviews by brushing up on common SRE scenarios. Think about how you’d handle incidents, improve monitoring, or automate processes. Practising these responses will help you feel more confident and ready to impress.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team and contributing to our mission.

We think you need these skills to ace Site Reliability Engineer in London

Site Reliability Engineering
Kubernetes
Helm
ELK stack
Monitoring and Alerting
Incident Management
Root Cause Analysis
Automation
Operational Improvement
Scripting (Bash, Python)
CI/CD Pipelines
Infrastructure as Code
Cloud-native Environments
Communication Skills
Collaboration Skills

Some tips for your application 🫡

Tailor Your Application: Make sure to customise your CV and cover letter to highlight your experience with Kubernetes, Helm, and the ELK stack. We want to see how your skills align with our needs, so don’t hold back on showcasing your relevant projects!

Showcase Your Problem-Solving Skills: In your application, share specific examples of how you've tackled production incidents or improved operational processes. We love seeing candidates who can demonstrate their hands-on experience and proactive approach to reliability engineering.

Be Clear and Concise: When writing your application, keep it straightforward and to the point. Use bullet points for key achievements and make sure your passion for Site Reliability Engineering shines through. We appreciate clarity and enthusiasm!

Apply Through Our Website: We encourage you to submit your application directly through our website. It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it shows you’re keen to join our team!

How to prepare for a job interview at CGI

✨Know Your Tech Stack

Make sure you’re well-versed in Kubernetes, Helm, and the ELK stack. Brush up on your hands-on experience with these tools, as they’ll likely come up during technical questions. Be ready to discuss specific scenarios where you've used them to improve platform reliability.

✨Showcase Your Problem-Solving Skills

Prepare to share examples of how you've tackled production incidents in the past. Highlight your approach to root cause analysis and how you’ve implemented solutions to prevent future issues. This will demonstrate your proactive mindset and ability to thrive in fast-paced environments.

✨Emphasise Collaboration

Since this role involves working closely with engineering and support teams, be prepared to discuss your experience in collaborative settings. Share examples of how you’ve engaged with stakeholders to enhance operational performance and ensure service availability.

✨Automate and Innovate

Talk about your experience with automation tools and scripting languages like Bash or Python. Discuss any projects where you’ve automated operational tasks to improve efficiency. This shows your commitment to continuous improvement and aligns with the role's focus on operational excellence.

Site Reliability Engineer in London
CGI
Location: London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>