At a Glance
- Tasks: Enhance system reliability and performance while collaborating on innovative solutions.
- Company: Join a leading global operator with a passion for excellence.
- Benefits: Enjoy hybrid working, eye care, flu vaccinations, and life assurance.
- Why this job: Make a real impact on system reliability and observability in a dynamic environment.
- Qualifications: Strong software engineering skills and knowledge of Site Reliability Engineering principles.
- Other info: Be part of a culture that values continuous improvement and teamwork.
The predicted salary is between 36000 - 60000 £ per year.
As a Site Reliability Engineer, you will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices.
You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability of critical systems, directly impacting operational efficiency.
Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management.
Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user demands and enhance overall service performance.
This role is eligible for inclusion in the Company’s hybrid working from home policy.
Preferred Skills and Experience- Excellent knowledge of Site Reliability Engineering principles, including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction.
- Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty.
- Knowledge and experience of modern software development techniques and lifecycles.
- Experience with Infrastructure as Code (IaC) automation and orchestration tools such as Ansible and Terraform.
- Prior experience working in a large scale, 24/7 enterprise where system uptime and stability is of paramount importance to the Business.
- Keen interest of industry trends, particularly Platform Engineering.
- Proficiency in shell scripting for automation and system management tasks.
- Writing and contributing to code that enhances the reliability and observability of services, including telemetry, operational APIs and tooling.
- Developing and maintaining tools that facilitate effective management of our systems, ensuring they are operationally efficient and resilient.
- Working with automation and orchestration platforms to automate manual activity and reduce toil.
- Building sophisticated dashboards using a range of telemetry data and dash boarding technologies like Grafana, Splunk and New Relic.
- Maintaining and administering existing monitoring and analytic toolsets.
- Mentoring colleagues in use of new technologies or practices.
- Actively participating in live incident resolution and post-mortem analysis, providing effective remediation strategies to improve overall system health and prevent future issues.
- Driving initiatives to enhance system reliability and observability, contributing to a culture of continuous improvement.
- Collaborating with the central Site Reliability Engineering and Observability teams to establish and uphold standards for reliability and observability, assisting teams in adhering to these practices.
- Working with IT Operations, providing and supporting the use of critical tooling to enable increasing levels of value to the Business.
- Eye care and Flu Vaccinations
- Life Assurance
Life at bet365: We are a unique global operator with passion and drive to be the best in the industry. Our values form the foundation of culture and shape the unique way that we work. People are our superpower and we support you to be the best you can be.
Site Reliability Engineer in Stoke-on-Trent employer: bet365 Group
Contact Detail:
bet365 Group Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineer in Stoke-on-Trent
✨Network Like a Pro
Get out there and connect with folks in the industry! Attend meetups, webinars, or even just grab a coffee with someone who’s already in the Site Reliability Engineering game. You never know when a casual chat could lead to your next big opportunity.
✨Show Off Your Skills
Don’t just tell us what you can do; show us! Create a portfolio or GitHub repo showcasing your projects, especially those involving observability tools like Grafana or New Relic. This gives us a tangible way to see your expertise in action.
✨Ace the Interview
Prepare for those interviews by brushing up on your technical skills and understanding of SLI and SLO principles. Practice common interview questions and scenarios related to incident resolution and system reliability to really impress us!
✨Apply Through Our Website
Make sure to apply through our website for the best chance at landing that role! We love seeing candidates who take the initiative to engage directly with us. Plus, it shows you’re serious about joining our team!
We think you need these skills to ace Site Reliability Engineer in Stoke-on-Trent
Some tips for your application 🫡
Tailor Your Application: Make sure to customise your CV and cover letter for the Site Reliability Engineer role. Highlight your software engineering skills and experience with reliability and observability tools, as this will show us you’re a perfect fit for the job!
Show Off Your Skills: Don’t hold back on showcasing your knowledge of SLI, SLO, and contemporary observability tools like Splunk and Grafana. We want to see how your expertise can enhance our systems and contribute to operational efficiency.
Be Clear and Concise: When writing your application, keep it clear and to the point. Use bullet points where possible to make it easy for us to see your key achievements and experiences related to system reliability and automation.
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you don’t miss out on any important updates about the process!
How to prepare for a job interview at bet365 Group
✨Know Your SRE Principles
Make sure you brush up on your Site Reliability Engineering principles, especially around Service Level Indicators (SLIs) and Service Level Objectives (SLOs). Be ready to discuss how you've implemented these in past roles and how they can enhance reliability and customer satisfaction.
✨Familiarise with Observability Tools
Get comfortable with contemporary observability tools like Splunk, New Relic, and Grafana. During the interview, be prepared to share specific examples of how you've used these tools to monitor system performance and improve operational efficiency.
✨Showcase Your Automation Skills
Highlight your experience with Infrastructure as Code (IaC) tools like Ansible and Terraform. Discuss any projects where you've automated processes or reduced manual toil, as this is a key aspect of the role.
✨Emphasise Collaboration
Since collaboration is crucial for this position, think of examples where you've worked across teams to integrate best practices into the software development life cycle. Be ready to explain how you fostered a culture of continuous improvement in your previous roles.