At a Glance
- Tasks: Lead incident management and ensure system reliability in a hands-on SRE role.
- Company: Join a forward-thinking tech company that values true SRE principles.
- Benefits: Competitive salary, fully remote work, and opportunities for professional growth.
- Why this job: Make a real impact on system reliability and lead a small, dynamic team.
- Qualifications: Expertise in AWS, Linux troubleshooting, and experience leading technical teams.
- Other info: Work in a supportive environment focused on SRE excellence.
The predicted salary is between 120000 - 120000 £ per year.
UK Remote Permanent | Up to £120,000 | Fully Remote (UK Only)
This is NOT a DevOps role. Real SRE work only.
We are looking for a true Senior Site Reliability Engineer with deep incident management experience, strong operational ownership, and expert Linux/AWS troubleshooting skills. This role is focused entirely on reliability, availability, incident response, and systems engineering, not building CI/CD pipelines or acting as DevOps by another name.
Leadership RequirementSmall Team Technical Lead. You must have experience leading a small engineering team (2-5 people), defining technical direction, improving on-call processes, and owning reliability strategy. This is a hands-on role with real SRE leadership, not people management.
About the RoleAs a Senior SRE, you will own the reliability, resilience, and operational health of large-scale AWS/Linux systems. You will join an engineering organisation where SRE principles are fully embedded, respected, and treated as a distinct discipline.
Key Responsibilities- Lead major incidents, mitigation, RCA, and preventative improvements
- Own and refine SLIs, SLOs, and error budgets
- Reduce operational toil through automation
- Deep-dive Linux debugging, performance tuning, and systems analysis
- Strengthen observability, monitoring, and alerting
- Provide technical leadership to a small SRE/engineering group
- Improve and manage on-call processes (PagerDuty, OpsGenie, etc.)
- Collaborate with development teams to build reliability into system design
- Strong AWS experience (EC2, networking, autoscaling, IAM, load balancing)
- Deep Linux troubleshooting skills (performance, networking, debugging)
- Real 24/7 production on-call experience
- Hands-on incident management and postmortems
- Experience mentoring or leading a small technical team
- Scripting/automation with Python, Go, or Bash
- Strong observability skills (Datadog, Prometheus, Grafana, CloudWatch)
You will be solving actual SRE problems: reliability, incidents, resilience, uptime. You will guide a small team through complex engineering challenges.
Site Reliability Engineering (SRE) Manager in Doncaster employer: Halian Technology Limited
Contact Detail:
Halian Technology Limited Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineering (SRE) Manager in Doncaster
✨Tip Number 1
Network, network, network! Reach out to your connections in the SRE community. Attend meetups or webinars, and don’t be shy about asking for introductions. The more people you know, the better your chances of landing that dream role.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your incident management experience and any automation projects you've worked on. This gives potential employers a tangible look at what you can bring to the table.
✨Tip Number 3
Prepare for technical interviews by brushing up on your Linux and AWS troubleshooting skills. Practice common incident scenarios and how you would handle them. Being able to demonstrate your hands-on experience will set you apart from the competition.
✨Tip Number 4
Don’t forget to apply through our website! We’re always on the lookout for talented individuals who are passionate about SRE. Your next big opportunity could be just a click away!
We think you need these skills to ace Site Reliability Engineering (SRE) Manager in Doncaster
Some tips for your application 🫡
Show Your SRE Skills: Make sure to highlight your deep incident management experience and strong operational ownership in your application. We want to see how you've tackled real SRE challenges, so don’t hold back on those examples!
Be Clear About Leadership Experience: Since this role involves leading a small engineering team, it’s crucial to detail your experience in defining technical direction and improving on-call processes. We’re looking for someone who can take charge, so let us know how you’ve done that before!
Tailor Your Application: Don’t just send a generic CV! Tailor your application to reflect the specific skills and experiences mentioned in the job description. We appreciate when candidates take the time to connect their background with what we’re looking for.
Apply Through Our Website: We encourage you to apply through our website for a smoother process. It helps us keep track of applications better and ensures you get all the updates directly from us. Plus, it shows you’re keen on joining the StudySmarter team!
How to prepare for a job interview at Halian Technology Limited
✨Know Your SRE Fundamentals
Make sure you brush up on your SRE principles. Understand the key concepts like SLIs, SLOs, and error budgets. Be ready to discuss how you've applied these in real-world scenarios, as this will show your depth of knowledge and practical experience.
✨Showcase Incident Management Skills
Prepare to share specific examples of major incidents you've led. Talk about your role in mitigation, root cause analysis, and any preventative measures you implemented. This will demonstrate your hands-on experience and ability to handle high-pressure situations.
✨Demonstrate Technical Leadership
Since this role involves leading a small team, be ready to discuss your leadership style. Share experiences where you've defined technical direction or improved on-call processes. Highlight how you’ve mentored others and fostered a culture of reliability within your team.
✨Get Familiar with Tools and Technologies
Make sure you're well-versed in the tools mentioned in the job description, like AWS, Linux troubleshooting, and observability platforms. Be prepared to discuss how you've used these tools in past roles to enhance system reliability and performance.