At a Glance
- Tasks: Ensure system reliability and performance while tackling live production issues.
- Company: Join a growing tech team focused on resilient platforms and services.
- Benefits: Smart casual dress code, gym access, and financial wellbeing support.
- Why this job: Work with modern cloud tech and make a real impact on critical systems.
- Qualifications: Experience in SRE or DevOps, strong Linux skills, and knowledge of observability tools.
- Other info: Dynamic environment with excellent career growth opportunities and supportive team culture.
The predicted salary is between 30000 - 50000 £ per year.
We have a fantastic new job opportunity for a Site Reliability Engineer / Systems Engineer to join a growing technology team focused on delivering reliable, scalable and resilient platforms and services.
As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments. This role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems.
DUTIES
- Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handover
- System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services
- Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues
- Reliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilience
- Automation and Resilience: Supporting automation, incident response and continuous improvement practices
- New Service Support: Ensuring new products and features are operable, reliable and scalable from day one
- Cross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issues
- Documentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reports
- Incident Prioritisation: Balancing customer impact with long-term system health and stability
- Security and Compliance: Supporting compliance with security, availability and regulatory frameworks
CANDIDATE REQUIREMENTS
ESSENTIAL
- Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role
- Experience supporting production services at scale within a DevOps or SRE environment
- Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6
- Experience with observability tools such as Prometheus, Grafana, ELK or Splunk
- Hands-on experience with containerisation and orchestration using Docker and Kubernetes
- Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices
- Strong Linux administration skills with scripting capability in Bash, Python or similar
- Familiarity with CI/CD pipelines and source control tools such as GitHub Actions
- Understanding of security frameworks and operational resilience best practices
DESIRABLE
- Experience within ISP, MSP or telecommunications environments
- Familiarity with enterprise IT architectures including OSS and BSS systems
- Knowledge of information security frameworks such as ISO27001, NIST or GDPR
- Experience with infrastructure automation tools such as Terraform or Ansible
BENEFITS
- Smart casual dress code
- Free access to gym facilities
- Access to a financial wellbeing platform (on successful completion of probationary period)
- Access to an employee assistance programme, Virtual GP and Elderly Care support (on successful completion of probationary period)
- Access to cycle to work, childcare, and electric vehicle schemes after six months
- Brand new office with excellent transport links
- Supportive team culture, growth and career progression
HOW TO APPLY
To be considered for this job vacancy, please submit your CV to our Recruitment Team who will review your details. CVās of Job Applicants meeting this requirement will be submitted to our Client for consideration. By submitting your job application to us you are hereby giving us your express consent to submit your details to our Client for this purpose.
Site Reliability Engineer / SRE / Systems Engineer in London employer: AWD online
Contact Detail:
AWD online Recruiting Team
StudySmarter Expert Advice š¤«
We think this is how you could land Site Reliability Engineer / SRE / Systems Engineer in London
āØTip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with current employees at companies you're eyeing. A friendly chat can sometimes lead to job opportunities that aren't even advertised!
āØTip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those involving cloud technologies, containerisation, and automation. This gives potential employers a taste of what you can bring to the table.
āØTip Number 3
Prepare for interviews by brushing up on incident management scenarios and system reliability challenges. Be ready to discuss how you've tackled similar issues in the past, as this will demonstrate your hands-on experience and problem-solving skills.
āØTip Number 4
Don't forget to apply through our website! Itās the best way to ensure your application gets seen by the right people. Plus, it shows you're genuinely interested in joining our team and contributing to our mission.
We think you need these skills to ace Site Reliability Engineer / SRE / Systems Engineer in London
Some tips for your application š«”
Tailor Your CV: Make sure your CV reflects the skills and experiences that match the Site Reliability Engineer role. Highlight your experience with cloud technologies, observability tools, and any relevant projects you've worked on.
Showcase Your Problem-Solving Skills: In your application, give examples of how you've tackled production issues or improved system reliability in the past. We love to see candidates who can demonstrate their ability to think critically and act decisively.
Keep It Clear and Concise: When writing your application, be clear and to the point. Use bullet points for key achievements and avoid jargon unless it's relevant to the role. We want to see your qualifications without wading through unnecessary fluff!
Apply Through Our Website: Don't forget to submit your application through our website! This ensures it gets to the right people quickly and helps us keep track of all applications efficiently. We can't wait to hear from you!
How to prepare for a job interview at AWD online
āØKnow Your Tech Stack
Make sure youāre familiar with the technologies mentioned in the job description, like Docker, Kubernetes, and observability tools. Brush up on your knowledge of cloud platforms, especially Google Cloud Platform, as well as networking concepts like DNS and DHCP. This will show that youāre not just a fit for the role but also genuinely interested in the tech.
āØDemonstrate Problem-Solving Skills
Prepare to discuss past incidents you've managed or resolved. Think about specific examples where you triaged issues or improved system reliability. Use the STAR method (Situation, Task, Action, Result) to structure your answers, making it easy for the interviewer to see your thought process and impact.
āØShow Your Collaborative Spirit
As a Site Reliability Engineer, you'll be working closely with various teams. Be ready to talk about how you've collaborated with developers, network engineers, or support teams in the past. Highlight any experiences where you translated operational insights into actionable improvements, as this is key to the role.
āØAsk Insightful Questions
At the end of the interview, donāt shy away from asking questions. Inquire about the teamās current challenges, the tools they use for incident management, or how they measure system reliability. This shows your enthusiasm for the role and helps you gauge if the company culture aligns with your values.