At a Glance
- Tasks: Support scalable production systems and ensure high availability across cloud platforms.
- Company: Join a fast-growing tech company with a focus on innovation and collaboration.
- Benefits: Enjoy remote work options, competitive salary, gym access, and career growth opportunities.
- Why this job: Make a real impact by optimising system performance and enhancing operational resilience.
- Qualifications: Experience in SRE, DevOps, or Systems Engineering with strong cloud and networking skills.
- Other info: Dynamic team culture with excellent support for personal and professional development.
The predicted salary is between 42000 - 84000 £ per year.
A fantastic opportunity for a Site Reliability Engineer / Systems Engineer to support highly available, scalable production systems within a fast-growing technology environment, working across cloud platforms, DevOps, networking and operational resilience.
LOCATION: Remote and Hybrid Working Options Available. You can either work remotely or if you prefer Hybrid working from home and the office in Altrincham, Greater Manchester, North West England.
JOB TYPE: Full-Time, Permanent
JOB OVERVIEW: We have a fantastic new job opportunity for a Site Reliability Engineer / Systems Engineer to join a growing technology team focused on delivering reliable, scalable and resilient platforms and services. As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments.
This role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems.
DUTIES
- Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handover.
- System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services.
- Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues.
- Reliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilience.
- Automation and Resilience: Supporting automation, incident response and continuous improvement practices.
- New Service Support: Ensuring new products and features are operable, reliable and scalable from day one.
- Cross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issues.
- Documentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reports.
- Incident Prioritisation: Balancing customer impact with long-term system health and stability.
- Security and Compliance: Supporting compliance with security, availability and regulatory frameworks.
CANDIDATE REQUIREMENTS
ESSENTIAL
- Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role.
- Experience supporting production services at scale within a DevOps or SRE environment.
- Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6.
- Experience with observability tools such as Prometheus, Grafana, ELK or Splunk.
- Hands-on experience with containerisation and orchestration using Docker and Kubernetes.
- Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices.
- Strong Linux administration skills with scripting capability in Bash, Python or similar.
- Familiarity with CI/CD pipelines and source control tools such as GitHub Actions.
- Understanding of security frameworks and operational resilience best practices.
DESIRABLE
- Experience within ISP, MSP or telecommunications environments.
- Familiarity with enterprise IT architectures including OSS and BSS systems.
- Knowledge of information security frameworks such as ISO27001, NIST or GDPR.
- Experience with infrastructure automation tools such as Terraform or Ansible.
BENEFITS
- Smart casual dress code.
- Free access to gym facilities.
- Access to a financial wellbeing platform (on successful completion of probationary period).
- Access to an employee assistance programme, Virtual GP and Elderly Care support (on successful completion of probationary period).
- Access to cycle to work, childcare, and electric vehicle schemes after six months.
- Brand new office with excellent transport links.
- Supportive team culture, growth and career progression.
HOW TO APPLY
To be considered for this job vacancy, please submit your CV to our Recruitment Team who will review your details.
Site Reliability Engineer / SRE / Systems Engineer employer: AWD RECRUITMENT LTD
Contact Detail:
AWD RECRUITMENT LTD Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineer / SRE / Systems Engineer
✨Tip Number 1
Network like a pro! Reach out to folks in the industry on LinkedIn or at tech meetups. You never know who might have the inside scoop on job openings or can put in a good word for you.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to cloud platforms and automation. This gives potential employers a taste of what you can do.
✨Tip Number 3
Prepare for interviews by practising common SRE scenarios. Think about how you'd handle incidents or improve system reliability. Being ready to discuss real-world examples will set you apart.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love hearing from passionate candidates like you!
We think you need these skills to ace Site Reliability Engineer / SRE / Systems Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV reflects the skills and experiences that match the Site Reliability Engineer role. Highlight your experience with cloud platforms, DevOps practices, and any relevant tools you've used.
Showcase Your Projects: Include specific projects where you've implemented observability tools or automated processes. This gives us a clear picture of your hands-on experience and problem-solving abilities.
Keep It Clear and Concise: We love a well-structured application! Use bullet points for easy reading and keep your descriptions focused on your achievements and impact in previous roles.
Apply Through Our Website: Don’t forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for this exciting opportunity.
How to prepare for a job interview at AWD RECRUITMENT LTD
✨Know Your Tech Inside Out
Make sure you brush up on your knowledge of cloud platforms, containerisation, and observability tools. Be ready to discuss your hands-on experience with technologies like Docker, Kubernetes, Prometheus, and Grafana. This will show that you’re not just familiar with the concepts but can also apply them in real-world scenarios.
✨Demonstrate Problem-Solving Skills
Prepare to share specific examples of how you've triaged incidents or resolved production issues in the past. Use the STAR method (Situation, Task, Action, Result) to structure your answers. This will help interviewers see your thought process and how you handle pressure in a live environment.
✨Showcase Collaboration Experience
As a Site Reliability Engineer, you'll need to work closely with various teams. Be ready to talk about times when you collaborated with development, operations, or network engineering teams. Highlight how you communicated effectively and contributed to achieving common goals.
✨Ask Insightful Questions
At the end of the interview, don’t forget to ask questions! Inquire about the team culture, ongoing projects, or how they measure success in the role. This shows your genuine interest in the position and helps you determine if it’s the right fit for you.