Service Reliability Engineer in London

Service Reliability Engineer in London

London Full-Time 55000 - 70000 € / year (est.) No home office possible
Deepstreamtech

At a Glance

  • Tasks: Ensure global services for artists and fans are always on and performing at their best.
  • Company: Join a dynamic team focused on connecting artists with their fans.
  • Benefits: Competitive salary, flexible working hours, and opportunities for professional growth.
  • Other info: Collaborative environment with a focus on innovation and continuous improvement.
  • Why this job: Make a real impact by enhancing service reliability and performance in a creative industry.
  • Qualifications: Experience in systems administration and proficiency in programming languages like Python or Java.

The predicted salary is between 55000 - 70000 € per year.

Requirements

  • A strong background in systems administration (Linux/Windows) in a large-scale environment
  • Proficiency in at least one programming language (e.g., Python, Go, Java)
  • Hands-on experience with a major cloud platform (AWS, GCP, or Azure), with a high preference for AWS
  • Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, Ansible)
  • Experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, Dynatrace)
  • Proven analytical and problem-solving abilities with experience in a high-pressure environment
  • Excellent communication skills and the ability to foster a collaborative team environment
  • (Desirable) Bachelor's degree in an IT-related field
  • (Desirable) Experience managing large-scale, distributed systems for a global organization
  • (Desirable) Familiarity with IT governance standards like ITIL
  • (Desirable) Direct experience with ServiceNow for IT service management
  • Knowledge of chaos engineering, resilience testing, and advanced capacity planning

What the job involves

  • As a Site Reliability Engineer, you won't just be supporting systems; you'll be ensuring the services that connect artists and fans around the globe are always on
  • System Reliability & Performance: Design, build, and maintain the availability, scalability, and performance of critical services
  • Develop and maintain robust monitoring, alerting, and observability systems (e.g., using AWS CloudWatch, Dynatrace) to ensure rapid issue detection and resolution
  • Monitor infrastructure capacity and performance, providing analysis and suggestions for service delivery improvement
  • Automation & Efficiency: Drive the automation of repetitive operational tasks, including infrastructure provisioning, deployments, and scaling
  • Create and maintain scripts and custom code to support and enhance our operational toolset
  • Support and optimize CI/CD pipelines to improve deployment speed and reliability
  • Incident Management & Collaboration: Participate in an on-call rotation to troubleshoot and mitigate production incidents
  • Lead post-incident reviews and root cause analyses to implement lasting solutions
  • Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design and development lifecycle

Service Reliability Engineer in London employer: Deepstreamtech

Join a dynamic and innovative team as a Service Reliability Engineer, where your expertise will directly contribute to ensuring seamless connections between artists and fans worldwide. Our company fosters a collaborative work culture that prioritises employee growth through continuous learning opportunities and cutting-edge technology. Located in a vibrant area, we offer competitive benefits and a commitment to work-life balance, making us an exceptional employer for those seeking meaningful and rewarding careers.

Deepstreamtech

Contact Detail:

Deepstreamtech Recruiting Team

StudySmarter Expert Advice🤫

We think this is how you could land Service Reliability Engineer in London

Tip Number 1

Network like a pro! Attend industry meetups, webinars, or even local tech events. You never know who might be looking for a Service Reliability Engineer just like you!

Tip Number 2

Show off your skills! Create a GitHub repository showcasing your projects, especially those involving cloud platforms or automation tools. This gives potential employers a taste of what you can do.

Tip Number 3

Prepare for interviews by brushing up on common SRE scenarios and problem-solving questions. Practise explaining your thought process clearly; communication is key in this role!

Tip Number 4

Don't forget to apply through our website! We love seeing candidates who are genuinely interested in joining our team. Plus, it makes tracking your application easier for us!

We think you need these skills to ace Service Reliability Engineer in London

Systems Administration (Linux/Windows)
Programming (Python, Go, Java)
Cloud Platform Experience (AWS, GCP, Azure)
Networking Knowledge
Containers (Docker, Kubernetes)
Infrastructure as Code (Terraform, Ansible)
Monitoring and Observability Tools (Prometheus, Grafana, Datadog, Splunk, Dynatrace)

Some tips for your application 🫡

Show Off Your Skills:Make sure to highlight your experience with systems administration and any programming languages you know. We want to see how your skills align with what we need for the Service Reliability Engineer role!

Tailor Your Application:Don’t just send a generic application! Take the time to tailor your CV and cover letter to reflect the specific requirements in the job description. We love seeing candidates who take this extra step.

Be Clear and Concise:When writing your application, keep it clear and to the point. We appreciate well-structured applications that make it easy for us to see your qualifications and experiences without wading through fluff.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy!

How to prepare for a job interview at Deepstreamtech

Know Your Tech Inside Out

Make sure you brush up on your systems administration skills, especially in Linux and Windows. Be ready to discuss your hands-on experience with cloud platforms like AWS, as well as your proficiency in programming languages such as Python or Go. They’ll likely ask you to solve a problem on the spot, so practice coding challenges beforehand!

Showcase Your Monitoring Skills

Familiarise yourself with modern monitoring and observability tools like Prometheus, Grafana, or Datadog. Be prepared to explain how you've used these tools in past roles to ensure system reliability and performance. Sharing specific examples of how you’ve improved service delivery through monitoring will definitely impress them.

Emphasise Collaboration

Since communication is key in this role, think of examples where you’ve successfully collaborated with teams to resolve incidents or improve processes. Highlight your experience in leading post-incident reviews and how you’ve fostered a collaborative environment. This will show that you’re not just a tech whiz but also a team player.

Prepare for Scenario Questions

Expect scenario-based questions that test your analytical and problem-solving abilities under pressure. Practice articulating your thought process when troubleshooting issues or managing incidents. They want to see how you approach problems, so be clear and structured in your responses.