At a Glance
- Tasks: Ensure system reliability and performance while collaborating with diverse teams.
- Company: Join a growing tech team focused on innovative platforms and services.
- Benefits: Enjoy smart casual dress, gym access, and career growth opportunities.
- Why this job: Make a real impact on critical systems using modern cloud technologies.
- Qualifications: Experience in SRE or DevOps roles with strong technical skills.
- Other info: Dynamic environment with excellent support and progression potential.
The predicted salary is between 30000 - 50000 £ per year.
We have a fantastic new job opportunity for a Site Reliability Engineer / Systems Engineer to join a growing technology team focused on delivering reliable, scalable and resilient platforms and services. As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments. This role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems.
DUTIES
- Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handover
- System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services
- Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues
- Reliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilience
- Automation and Resilience: Supporting automation, incident response and continuous improvement practices
- New Service Support: Ensuring new products and features are operable, reliable and scalable from day one
- Cross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issues
- Documentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reports
- Incident Prioritisation: Balancing customer impact with long-term system health and stability
- Security and Compliance: Supporting compliance with security, availability and regulatory frameworks
CANDIDATE REQUIREMENTS
ESSENTIAL
- Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role
- Experience supporting production services at scale within a DevOps or SRE environment
- Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6
- Experience with observability tools such as Prometheus, Grafana, ELK or Splunk
- Hands-on experience with containerisation and orchestration using Docker and Kubernetes
- Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices
- Strong Linux administration skills with scripting capability in Bash, Python or similar
- Familiarity with CI/CD pipelines and source control tools such as GitHub Actions
- Understanding of security frameworks and operational resilience best practices
DESIRABLE
- Experience within ISP, MSP or telecommunications environments
- Familiarity with enterprise IT architectures including OSS and BSS systems
- Knowledge of information security frameworks such as ISO27001, NIST or GDPR
- Experience with infrastructure automation tools such as Terraform or Ansible
BENEFITS
- Smart casual dress code
- Free access to gym facilities
- Access to a financial wellbeing platform (on successful completion of probationary period)
- Access to an employee assistance programme, Virtual GP and Elderly Care support (on successful completion of probationary period)
- Access to cycle to work, childcare, and electric vehicle schemes after six months
- Brand new office with excellent transport links
- Supportive team culture, growth and career progression
HOW TO APPLY
To be considered for this job vacancy, please submit your CV to our Recruitment Team who will review your details. CV’s of Job Applicants meeting this requirement will be submitted to our Client for consideration. By submitting your job application to us you are hereby giving us your express consent to submit your details to our Client for this purpose.
Site Reliability Engineer / SRE / Systems Engineer employer: AWD online
Contact Detail:
AWD online Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineer / SRE / Systems Engineer
✨Tip Number 1
Network like a pro! Reach out to folks in the industry on LinkedIn or at tech meetups. A friendly chat can lead to opportunities that aren’t even advertised yet.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repo showcasing your projects, especially those involving cloud tech and automation. It’s a great way to demonstrate your expertise beyond just a CV.
✨Tip Number 3
Prepare for interviews by brushing up on common SRE scenarios. Think about how you’d handle incidents or improve system reliability. We want to see your problem-solving skills in action!
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love hearing from passionate candidates like you!
We think you need these skills to ace Site Reliability Engineer / SRE / Systems Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV reflects the skills and experiences that match the Site Reliability Engineer role. Highlight your experience with cloud technologies, observability tools, and any relevant projects you've worked on.
Showcase Your Problem-Solving Skills: In your application, give examples of how you've tackled production issues or improved system reliability. We want to see your thought process and how you approach challenges in a live environment.
Keep It Clear and Concise: When writing your application, be straightforward. Use clear language and avoid jargon unless it's relevant. We appreciate a well-structured CV that’s easy to read and gets straight to the point.
Apply Through Our Website: Don’t forget to submit your application through our website! This helps us keep everything organised and ensures your details reach the right people quickly.
How to prepare for a job interview at AWD online
✨Know Your Tech Stack
Make sure you’re well-versed in the technologies mentioned in the job description, like Docker, Kubernetes, and observability tools. Brush up on your knowledge of cloud platforms, especially Google Cloud Platform, as this will show your potential employer that you’re ready to hit the ground running.
✨Prepare for Incident Scenarios
Since incident management is a key part of the role, be prepared to discuss how you’ve handled production issues in the past. Think of specific examples where you triaged incidents or improved system reliability, and be ready to explain your thought process and actions taken.
✨Showcase Your Collaboration Skills
This role requires working closely with various teams, so highlight your experience in cross-team collaboration. Be ready to share examples of how you’ve worked with development, operations, or network engineering teams to solve problems or improve processes.
✨Ask Insightful Questions
At the end of the interview, don’t forget to ask questions that show your interest in the company’s long-term goals and challenges. Inquire about their current projects related to automation or resilience improvements, which can demonstrate your enthusiasm for contributing to their success.