Site Reliability Engineer in City of London
Site Reliability Engineer

Site Reliability Engineer in City of London

City of London Full-Time 30000 - 50000 £ / year (est.) No home office possible
Go Premium
T

At a Glance

  • Tasks: Solve complex reliability challenges and ensure our SaaS products are always available.
  • Company: Join Tribal, a leading EdTech company transforming education with innovative software solutions.
  • Benefits: Enjoy a fully remote role with competitive salary and opportunities for professional growth.
  • Why this job: Make a real impact in the education sector while working with cutting-edge cloud technologies.
  • Qualifications: Experience with AWS or Azure, strong Linux knowledge, and automation skills are essential.
  • Other info: Be part of a diverse team committed to inclusivity and innovation.

The predicted salary is between 30000 - 50000 £ per year.

Are you an engineer who thrives on solving complex reliability challenges across cloud platforms? We’re looking for a Site Reliability Engineer who can combine strong technical capability with a pragmatic approach to automation, monitoring, and service delivery. You’ll help keep Tribal’s education-driven SaaS products highly available, scalable, and performant.

At Tribal, we are a leading EdTech business providing market-leading software solutions to the global education market. We research, develop, and deliver the products, services, and solutions that education institutions worldwide rely on to support their core mission: educating students, delivering exceptional learning experiences, and achieving successful outcomes.

Our Platform Engineering function is at the heart of this, ensuring our systems are designed and maintained to the highest standards of reliability and security. As part of the SRE & Operations team, you’ll play a key role in delivering Tribal’s products through the public cloud as SaaS services across AWS and Azure.

The Role:

As a Site Reliability Engineer, you’ll design, build, and operate large-scale systems with an emphasis on reliability, efficiency, and automation. You’ll work across deployment, monitoring, and incident response to ensure our platforms stay healthy and our customers experience uninterrupted service.

You’ll be involved in:

  • Maintaining and improving production systems for availability, latency, and scalability
  • Supporting application deployment and configuration to production environments
  • Building or enhancing automation tools (Ansible, scripts, utilities)
  • Implementing and managing observability tools such as DataDog or New Relic
  • Analyzing logs and metrics to identify trends and improve reliability
  • Supporting incident response and performing root-cause analysis
  • Collaborating closely with engineering and customer teams to deliver proactive, preventative support
  • Participating in on-call and out-of-hours rotations in line with Tribal’s On-Call Policy

This is a full-time, fully remote UK-based role, with occasional national travel for team collaboration or customer engagements.

What you’ll bring:

  • Strong experience with AWS (or Azure) environments
  • Solid knowledge of Linux, Apache, and PHP in a production context
  • Familiarity with automation/configuration tools such as Ansible
  • Experience with monitoring and logging platforms (e.g. DataDog, New Relic, Azure Monitor)
  • Good understanding of database fundamentals (SQL Server / Oracle)
  • Hands-on troubleshooting and problem-solving skills
  • Customer-facing experience with incident or service management tools (RemedyForce, ServiceNow)
  • Strong written and verbal communication skills, able to translate technical details clearly

Nice-to-have:

  • Experience coding or scripting (Python, PowerShell, or Bash)
  • Understanding of CI/CD pipelines (Azure DevOps or similar)
  • ITIL Foundation or cloud certifications (AWS SysOps Administrator, AWS Solutions Architect)

Note to applicants: We welcome applications from individuals who already have the right to work in the UK. As an equal opportunity employer, Tribal celebrates diversity and is committed to creating an inclusive environment for all employees. We make sure that our recruitment and selection processes never discriminate based upon any protected characteristics and actively welcome applications from all groups, not least those underrepresented in the tech sector.

Note to all applicants - Tribal reserve the right to close an advertisement to applications ahead of the advertised closure date. For this reason, shortlisting may take place prior to the closing date on some occasions. With this in mind, please do not hesitate to apply early.

Site Reliability Engineer in City of London employer: Tribal Group

At Tribal, we pride ourselves on being an exceptional employer that fosters a collaborative and inclusive work culture, where innovation thrives. As a fully remote UK-based company, we offer our Site Reliability Engineers the flexibility to work from anywhere while providing opportunities for professional growth through continuous learning and development. Join us in making a meaningful impact in the EdTech sector, where your contributions will help shape the future of education for institutions worldwide.
T

Contact Detail:

Tribal Group Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer in City of London

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, especially those at Tribal. A friendly chat can open doors and give you insights that job descriptions just can't.

✨Tip Number 2

Show off your skills! If you've got a project or a GitHub repo that highlights your SRE expertise, share it during interviews. It’s a great way to demonstrate your hands-on experience.

✨Tip Number 3

Prepare for technical interviews by brushing up on your troubleshooting skills. Be ready to tackle real-world scenarios that might come up in the role, especially around AWS or Azure.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive!

We think you need these skills to ace Site Reliability Engineer in City of London

AWS
Azure
Linux
Apache
PHP
Ansible
DataDog
New Relic
SQL Server
Oracle
Troubleshooting
Problem-Solving Skills
Incident Management
Communication Skills
CI/CD Pipelines

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights your experience with AWS or Azure, Linux, and automation tools. We want to see how your skills align with the role of a Site Reliability Engineer, so don’t hold back on showcasing relevant projects!

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you’re passionate about reliability challenges and how your background makes you a great fit for our team. Keep it concise but impactful!

Showcase Your Problem-Solving Skills: In your application, mention specific instances where you've tackled complex issues in production systems. We love seeing real-world examples of your troubleshooting and analytical skills!

Apply Through Our Website: We encourage you to apply directly through our website for the best chance of getting noticed. It’s the quickest way for us to receive your application and start the conversation!

How to prepare for a job interview at Tribal Group

✨Know Your Tech Inside Out

Make sure you brush up on your knowledge of AWS or Azure, as well as Linux, Apache, and PHP. Be ready to discuss how you've used these technologies in real-world scenarios, especially in terms of reliability and automation.

✨Showcase Your Problem-Solving Skills

Prepare to share specific examples of how you've tackled complex reliability challenges. Think about incidents you've managed, the tools you used for monitoring, and how you approached root-cause analysis.

✨Familiarise Yourself with Automation Tools

Since automation is key for this role, be ready to talk about your experience with tools like Ansible or any scripting languages you know. Highlight any projects where you've built or enhanced automation tools.

✨Communicate Clearly and Confidently

Strong communication skills are essential, especially when translating technical details to non-technical stakeholders. Practice explaining your past projects and experiences in a way that's easy to understand, focusing on the impact of your work.

Site Reliability Engineer in City of London
Tribal Group
Location: City of London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

T
  • Site Reliability Engineer in City of London

    City of London
    Full-Time
    30000 - 50000 £ / year (est.)
  • T

    Tribal Group

    200-500
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>