At a Glance
- Tasks: Lead incident management and ensure system reliability in a hands-on SRE role.
- Company: Join a forward-thinking tech company that values SRE principles.
- Benefits: Competitive salary up to £120,000, fully remote work, and career growth opportunities.
- Other info: Work in a dynamic environment focused on operational excellence and innovation.
- Why this job: Make a real impact on system reliability and lead a small engineering team.
- Qualifications: Strong AWS and Linux skills, with experience in incident management and team leadership.
The predicted salary is between 120000 - 120000 € per year.
Senior Site Reliability Engineer (SRE) UK Remote Permanent | Up to £120,000 | Fully Remote (UK Only)
This is NOT a DevOps Role. Real SRE Work Only.
We are looking for a true Senior Site Reliability Engineer with deep incident management experience, strong operational ownership, and expert Linux/AWS troubleshooting skills.
This role is focused entirely on reliability, availability, incident response, and systems engineering, not building CI/CD pipelines or acting as DevOps by another name.
Leadership RequirementSmall Team Technical Lead
You must have experience leading a small engineering team (25 people), defining technical direction, improving on-call processes, and owning reliability strategy. This is a hands-on role with real SRE leadership, not people management.
About the RoleAs a Senior SRE, you will own the reliability, resilience, and operational health of large-scale AWS/Linux systems. You will join an engineering organisation where SRE principles are fully embedded, respected, and treated as a distinct discipline.
Key Responsibilities- Lead major incidents, mitigation, RCA, and preventative improvements
- Own and refine SLIs, SLOs, and error budgets
- Reduce operational toil through automation
- Deep-dive Linux debugging, performance tuning, and systems analysis
- Strengthen observability, monitoring, and alerting
- Provide technical leadership to a small SRE/engineering group
- Improve and manage on-call processes (PagerDuty, OpsGenie, etc.)
- Collaborate with development teams to build reliability into system design
- Strong AWS experience (EC2, networking, autoscaling, IAM, load balancing)
- Deep Linux troubleshooting skills (performance, networking, debugging)
- Real 24/7 production on-call experience
- Hands-on incident management and postmortems
- Experience mentoring or leading a small technical team
- Scripting/automation with Python, Go, or Bash
- Strong observability skills (Datadog, Prometheus, Grafana, CloudWatch)
You will be solving actual SRE problems: reliability, incidents, resilience, uptime. You will guide a small team through complex engineering challenges.
Site Reliability Engineering Manager - Data in Cardiff employer: Halian Technology Limited
Join a forward-thinking company that prioritises the principles of Site Reliability Engineering, offering a fully remote work environment across the UK. With a strong focus on employee growth and technical leadership, you will have the opportunity to lead a small team while tackling real SRE challenges in a supportive culture that values innovation and operational excellence. Enjoy competitive compensation and the flexibility to balance your professional and personal life, making this an ideal workplace for those seeking meaningful and rewarding employment.
StudySmarter Expert Advice🤫
We think this is how you could land Site Reliability Engineering Manager - Data in Cardiff
✨Tip Number 1
Network like a pro! Reach out to your connections in the SRE community, attend meetups, and engage in online forums. You never know who might have the inside scoop on job openings or can refer you directly.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your incident management experience and Linux troubleshooting projects. This gives potential employers a taste of what you can bring to the table.
✨Tip Number 3
Prepare for those interviews! Brush up on your technical knowledge, especially around AWS and Linux systems. Practice common SRE scenarios and be ready to discuss how you've handled incidents in the past.
✨Tip Number 4
Apply through our website! We love seeing candidates who are genuinely interested in joining us. Tailor your application to highlight your leadership experience and operational ownership in SRE roles.
We think you need these skills to ace Site Reliability Engineering Manager - Data in Cardiff
Some tips for your application 🫡
Show Your SRE Skills:Make sure to highlight your deep incident management experience and Linux/AWS troubleshooting skills in your application. We want to see how you've tackled real SRE challenges, so don’t hold back!
Be Clear About Leadership Experience:Since this role involves leading a small engineering team, it’s crucial to detail your experience in defining technical direction and improving on-call processes. We’re looking for someone who can take charge, so let us know how you’ve done that before!
Focus on Reliability and Resilience:Your application should reflect your commitment to reliability and operational health. Share examples of how you've owned SLIs, SLOs, and error budgets in past roles. We love seeing candidates who are passionate about these principles!
Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for this exciting opportunity. Don’t miss out!
How to prepare for a job interview at Halian Technology Limited
✨Know Your SRE Fundamentals
Make sure you brush up on your Site Reliability Engineering principles. Understand the key concepts of SLIs, SLOs, and error budgets, as these will likely come up in conversation. Be ready to discuss how you've applied these in past roles.
✨Showcase Your Incident Management Skills
Prepare specific examples of major incidents you've managed. Talk about your role in leading the incident response, the steps you took for mitigation, and how you conducted postmortems. This will demonstrate your hands-on experience and leadership capabilities.
✨Demonstrate Technical Leadership
Since this role involves leading a small team, be prepared to discuss your experience in mentoring or guiding engineers. Share how you've defined technical direction and improved on-call processes in previous positions to show you're ready for this responsibility.
✨Get Familiar with Tools and Technologies
Make sure you know your way around AWS and Linux troubleshooting tools. Be ready to discuss your experience with observability tools like Datadog or Grafana, and how you've used them to enhance system reliability. This will show that you're not just familiar with the tech but can also leverage it effectively.