At a Glance
- Tasks: Manage incident processes and ensure system reliability for a massive infrastructure.
- Company: Join a leading IT Service Management company focused on building a world-class SRE team.
- Benefits: Enjoy flexible working options and the chance to work with cutting-edge technology.
- Why this job: Be part of a mission-critical platform impacting millions while collaborating with top engineers.
- Qualifications: 7-12 years of experience, with strong Linux and AWS skills required.
- Other info: Ideal for hands-on engineers passionate about incident response and scalability.
The predicted salary is between 48000 - 72000 £ per year.
Are you among the top 1% of Site Reliability Engineers in the UK? Our client, an IT Service Management company, is building a world-class SRE team to support a mission-critical Java-based platform used by millions. If you’re a hands-on engineer with a background in Linux systems, deep AWS expertise, and a passion for incident response, reliability, and scale, we want to hear from you.
What You’ll Be Doing:
- Own and evolve our incident management and on-call processes
- Ensure uptime, scalability, and security across a massive infrastructure footprint
- Work with EKS, EC2, Load Balancers, VPC, CDK, Terraform, CloudFormation
- Write and maintain YAML, Python scripts, and internal tooling
- Define and track SLAs, SLOs, and SLIs to drive reliability
- Collaborate with platform engineers and developers to support a Java-based product
- Operate in a manual, tool-light environment while helping us scale and automate
What We’re Looking For:
- 7–12 years of experience, with 5+ years in SRE roles
- Strong Linux/System Admin foundation
- Proven experience in live incident troubleshooting and root cause analysis
- Deep AWS knowledge – you can speak to how you’ve used services like EKS, EC2, Load Balancers in production
- Experience with monitoring, alerting, capacity planning, and security best practices
- Comfortable working in large-scale environments with thousands of endpoints
- Clear communicator who can document and share knowledge across teams
- Able to work independently and thrive in a globally distributed team
Locations
Site Reliability Engineer employer: Halian
Contact Detail:
Halian Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineer
✨Tip Number 1
Make sure to showcase your hands-on experience with AWS services like EKS and EC2. During networking opportunities or interviews, be ready to discuss specific projects where you implemented these technologies, as this will demonstrate your practical knowledge.
✨Tip Number 2
Familiarise yourself with incident management processes and be prepared to share examples of how you've handled live incidents in the past. This will highlight your ability to troubleshoot effectively and your understanding of reliability in large-scale environments.
✨Tip Number 3
Connect with current or former employees of the company on platforms like LinkedIn. Engaging with them can provide insights into the company culture and expectations, which can be invaluable during your application process.
✨Tip Number 4
Stay updated on the latest trends and best practices in Site Reliability Engineering. Being knowledgeable about new tools and methodologies can set you apart from other candidates and show your commitment to continuous learning.
We think you need these skills to ace Site Reliability Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights your experience in Site Reliability Engineering, particularly your hands-on work with Linux systems and AWS. Use specific examples to demonstrate your skills in incident management and troubleshooting.
Craft a Compelling Cover Letter: In your cover letter, express your passion for reliability and incident response. Mention how your background aligns with the company's mission and the specific technologies they use, such as EKS and Terraform.
Showcase Relevant Projects: If you have worked on projects that involved large-scale environments or Java-based platforms, be sure to include these in your application. Detail your role and the impact of your contributions.
Highlight Communication Skills: Since clear communication is essential for this role, provide examples of how you've documented processes or shared knowledge with teams. This will demonstrate your ability to collaborate effectively in a distributed environment.
How to prepare for a job interview at Halian
✨Showcase Your Technical Skills
Be prepared to discuss your hands-on experience with Linux systems and AWS services like EKS and EC2. Highlight specific projects where you've implemented these technologies, focusing on your role in incident management and troubleshooting.
✨Demonstrate Problem-Solving Abilities
Expect scenario-based questions that assess your incident response skills. Prepare examples of past incidents you've managed, detailing your approach to root cause analysis and how you ensured system reliability.
✨Communicate Clearly
As a Site Reliability Engineer, clear communication is key. Practice explaining complex technical concepts in simple terms, as you'll need to collaborate with platform engineers and developers who may not have the same technical background.
✨Understand the Company’s Infrastructure
Research the company's infrastructure and tools they use. Familiarise yourself with their incident management processes and be ready to suggest improvements based on your experience with similar environments.