At a Glance
- Tasks: Solve complex reliability challenges and ensure our SaaS products are always available.
- Company: Join Tribal, a leading EdTech company transforming education with innovative software solutions.
- Benefits: Enjoy a fully remote role with competitive salary and opportunities for professional growth.
- Why this job: Make a real impact in the education sector while working with cutting-edge cloud technologies.
- Qualifications: Experience with AWS or Azure, strong Linux knowledge, and automation skills are essential.
- Other info: Be part of a diverse team committed to inclusivity and innovation.
The predicted salary is between 30000 - 50000 £ per year.
Are you an engineer who thrives on solving complex reliability challenges across cloud platforms? We’re looking for a Site Reliability Engineer who can combine strong technical capability with a pragmatic approach to automation, monitoring, and service delivery. You’ll help keep Tribal’s education-driven SaaS products highly available, scalable, and performant.
At Tribal, we are a leading EdTech business providing market-leading software solutions to the global education market. We research, develop, and deliver the products, services, and solutions that education institutions worldwide rely on to support their core mission: educating students, delivering exceptional learning experiences, and achieving successful outcomes.
Our Platform Engineering function is at the heart of this, ensuring our systems are designed and maintained to the highest standards of reliability and security. As part of the SRE & Operations team, you’ll play a key role in delivering Tribal’s products through the public cloud as SaaS services across AWS and Azure.
The Role:
As a Site Reliability Engineer, you’ll design, build, and operate large-scale systems with an emphasis on reliability, efficiency, and automation. You’ll work across deployment, monitoring, and incident response to ensure our platforms stay healthy and our customers experience uninterrupted service.
You’ll be involved in:
- Maintaining and improving production systems for availability, latency, and scalability
- Supporting application deployment and configuration to production environments
- Building or enhancing automation tools (Ansible, scripts, utilities)
- Implementing and managing observability tools such as DataDog or New Relic
- Analyzing logs and metrics to identify trends and improve reliability
- Supporting incident response and performing root-cause analysis
- Collaborating closely with engineering and customer teams to deliver proactive, preventative support
- Participating in on-call and out-of-hours rotations in line with Tribal’s On-Call Policy
This is a full-time, fully remote UK-based role, with occasional national travel for team collaboration or customer engagements.
What you’ll bring:
- Strong experience with AWS (or Azure) environments
- Solid knowledge of Linux, Apache, and PHP in a production context
- Familiarity with automation/configuration tools such as Ansible
- Experience with monitoring and logging platforms (e.g. DataDog, New Relic, Azure Monitor)
- Good understanding of database fundamentals (SQL Server / Oracle)
- Hands-on troubleshooting and problem-solving skills
- Customer-facing experience with incident or service management tools (RemedyForce, ServiceNow)
- Strong written and verbal communication skills, able to translate technical details clearly
Nice-to-have:
- Experience coding or scripting (Python, PowerShell, or Bash)
- Understanding of CI/CD pipelines (Azure DevOps or similar)
- ITIL Foundation or cloud certifications (AWS SysOps Administrator, AWS Solutions Architect)
Note to applicants: We welcome applications from individuals who already have the right to work in the UK. As an equal opportunity employer, Tribal celebrates diversity and is committed to creating an inclusive environment for all employees. We make sure that our recruitment and selection processes never discriminate based upon any protected characteristics and actively welcome applications from all groups, not least those underrepresented in the tech sector.
Note to all applicants - Tribal reserve the right to close an advertisement to applications ahead of the advertised closure date. For this reason, shortlisting may take place prior to the closing date on some occasions. With this in mind, please do not hesitate to apply early.
Site Reliability Engineer in City of London employer: Tribal Group
Contact Detail:
Tribal Group Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineer in City of London
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, especially those at Tribal. A friendly chat can open doors and give you insights that job descriptions just can't.
✨Tip Number 2
Show off your skills! If you've got a project or a GitHub repo that highlights your SRE expertise, share it during interviews. It’s a great way to demonstrate your hands-on experience.
✨Tip Number 3
Prepare for technical interviews by brushing up on your troubleshooting skills. Be ready to tackle real-world scenarios that might come up in the role, especially around AWS or Azure.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive!
We think you need these skills to ace Site Reliability Engineer in City of London
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights your experience with AWS or Azure, Linux, and automation tools. We want to see how your skills align with the role of a Site Reliability Engineer, so don’t hold back on showcasing relevant projects!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you’re passionate about reliability challenges and how your background makes you a great fit for our team. Keep it concise but impactful!
Showcase Your Problem-Solving Skills: In your application, mention specific instances where you've tackled complex issues in production systems. We love seeing real-world examples of your troubleshooting and analytical skills!
Apply Through Our Website: We encourage you to apply directly through our website for the best chance of getting noticed. It’s the quickest way for us to receive your application and start the conversation!
How to prepare for a job interview at Tribal Group
✨Know Your Tech Inside Out
Make sure you brush up on your knowledge of AWS or Azure, as well as Linux, Apache, and PHP. Be ready to discuss how you've used these technologies in real-world scenarios, especially in terms of reliability and automation.
✨Showcase Your Problem-Solving Skills
Prepare to share specific examples of how you've tackled complex reliability challenges. Think about incidents you've managed, the tools you used for monitoring, and how you approached root-cause analysis.
✨Familiarise Yourself with Automation Tools
Since automation is key for this role, be ready to talk about your experience with tools like Ansible or any scripting languages you know. Highlight any projects where you've built or enhanced automation tools.
✨Communicate Clearly and Confidently
Strong communication skills are essential, especially when translating technical details to non-technical stakeholders. Practice explaining your past projects and experiences in a way that's easy to understand, focusing on the impact of your work.