At a Glance
- Tasks: Support scalable production systems and ensure high availability across cloud platforms.
- Company: Join a fast-growing tech company with a focus on innovation and collaboration.
- Benefits: Enjoy remote work options, competitive salary, gym access, and career growth opportunities.
- Why this job: Make a real impact by optimising system performance and reliability in a dynamic environment.
- Qualifications: Experience in SRE, DevOps, or Systems Engineering with strong cloud and networking knowledge.
- Other info: Be part of a supportive team culture with excellent transport links and modern office facilities.
The predicted salary is between 42000 - 84000 £ per year.
A fantastic opportunity for a Site Reliability Engineer / Systems Engineer to support highly available, scalable production systems within a fast-growing technology environment, working across cloud platforms, DevOps, networking and operational resilience.
We have a fantastic new job opportunity for a Site Reliability Engineer / Systems Engineer to join a growing technology team focused on delivering reliable, scalable and resilient platforms and services.
As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments.
This Site Reliability Engineer/ Systems Engineer role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems.
DUTIES
- Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handover
- System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services
- Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues
- Reliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilience
- Automation and Resilience: Supporting automation, incident response and continuous improvement practices
- New Service Support: Ensuring new products and features are operable, reliable and scalable from day one
- Cross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issues
- Documentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reports
- Incident Prioritisation: Balancing customer impact with long-term system health and stability
- Security and Compliance: Supporting compliance with security, availability and regulatory frameworks
CANDIDATE REQUIREMENTS
- Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role
- Experience supporting production services at scale within a DevOps or SRE environment
- Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6
- Experience with observability tools such as Prometheus, Grafana, ELK or Splunk
- Hands-on experience with containerisation and orchestration using Docker and Kubernetes
- Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices
- Strong Linux administration skills with scripting capability in Bash, Python or similar
- Familiarity with CI/CD pipelines and source control tools such as GitHub Actions
- Understanding of security frameworks and operational resilience best practices
DESIRABLE
- Experience within ISP, MSP or telecommunications environments
- Familiarity with enterprise IT architectures including OSS and BSS systems
- Knowledge of information security frameworks such as ISO27001, NIST or GDPR
- Experience with infrastructure automation tools such as Terraform or Ansible
BENEFITS
- Smart casual dress code
- Free access to gym facilities
- Access to a financial wellbeing platform (on successful completion of probationary period)
- Access to an employee assistance programme, Virtual GP and Elderly Care support (on successful completion of probationary period)
- Access to cycle to work, childcare, and electric vehicle schemes after six months
- Brand new office with excellent transport links
- Supportive team culture, growth and career progression
HOW TO APPLY
To be considered for this job vacancy, please submit your CV to our Recruitment Team who will review your details. CVs of Job Applicants meeting this requirement will be submitted to our Client for consideration.
Site Reliability Engineer / SRE / Systems Engineer in Manchester employer: AWD online
Contact Detail:
AWD online Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineer / SRE / Systems Engineer in Manchester
✨Tip Number 1
Network like a pro! Reach out to folks in the industry on LinkedIn or at tech meetups. You never know who might have the inside scoop on job openings or can put in a good word for you.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to cloud platforms and automation. This gives potential employers a taste of what you can do.
✨Tip Number 3
Prepare for interviews by brushing up on common SRE scenarios and incident management techniques. Practise explaining your thought process clearly; it’s all about demonstrating your problem-solving skills.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love hearing from passionate candidates like you!
We think you need these skills to ace Site Reliability Engineer / SRE / Systems Engineer in Manchester
Some tips for your application 🫡
Tailor Your CV: Make sure your CV is tailored to the Site Reliability Engineer role. Highlight your experience with cloud platforms, DevOps practices, and any relevant tools you've used. We want to see how your skills match what we're looking for!
Showcase Your Projects: If you've worked on any projects that demonstrate your ability to maintain high availability and performance, be sure to include them. We love seeing real-world examples of your work and how you’ve tackled challenges in production environments.
Be Clear and Concise: When writing your application, keep it clear and to the point. Use bullet points where possible to make it easy for us to read through your experience and skills. We appreciate a well-structured application!
Apply Through Our Website: Don’t forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. We can’t wait to hear from you!
How to prepare for a job interview at AWD online
✨Know Your Tech Stack
Make sure you’re familiar with the technologies mentioned in the job description, like Docker, Kubernetes, and observability tools. Brush up on your knowledge of cloud platforms, especially Google Cloud Platform, as this will show that you're ready to hit the ground running.
✨Demonstrate Problem-Solving Skills
Prepare to discuss specific incidents where you triaged issues or improved system reliability. Use the STAR method (Situation, Task, Action, Result) to structure your answers, showcasing your ability to handle real-world challenges effectively.
✨Showcase Collaboration Experience
As a Site Reliability Engineer, you'll need to work closely with various teams. Be ready to share examples of how you've collaborated with developers, network engineers, or operations teams to resolve issues or implement improvements. This will highlight your teamwork skills.
✨Ask Insightful Questions
At the end of the interview, don’t forget to ask questions! Inquire about the team culture, ongoing projects, or how they measure success in the role. This shows your genuine interest in the position and helps you assess if it’s the right fit for you.