Sr Service Reliability Engineer – Kings Cross, London
Sr Service Reliability Engineer – Kings Cross, London

Sr Service Reliability Engineer – Kings Cross, London

London Full-Time 48000 - 72000 £ / year (est.) No home office possible
Universal Music Group

At a Glance

  • Tasks: Design and maintain reliable systems that connect artists and fans globally.
  • Company: Join Universal Music, the world's leading music company with a vibrant culture.
  • Benefits: Competitive salary, inclusive environment, and opportunities for professional growth.
  • Other info: Dynamic team atmosphere with a commitment to diversity and inclusion.
  • Why this job: Be a key player in shaping the future of music technology.
  • Qualifications: Experience in systems administration and programming, with a passion for problem-solving.

The predicted salary is between 48000 - 72000 £ per year.

Music is Universal It’s the passionate and dedicated team at Universal Music who help make us the world’s leading music company. From A&R to finance, legal to digital, sales to marketing, Universal Music is the place to grow and develop your career within a truly commercial and innovative business that leads in everything it does.Everyone is welcome to apply for our roles, and we are determined to ensure that no applicant or employee receives less favourable treatment because of gender, race, disability, sexual orientation, religion, belief, age, marital status, background, pregnancy, or caring responsibilities. We also recognise the importance of diversity of thought within our teams and are fully committed to embracing the talents of people with autism, dyslexia, ADHD, and other forms of neurocognitive variation.We will always seek to make appropriate adjustments to recruitment, workplaces, and work processes to be fully inclusive to people with different needs and working styles. If you need us to make any reasonable adjustments for you from application onwards, including alternatives to the online form or to disclose a neurocognitive condition, please email UniversalMusicCareers@umusic.com.**Job Summary:**We are UMG, the Universal Music Group. We are the world’s leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world.As a key member of our Global Technical Operations team, you will be the ultimate escalation point and subject matter expert for all SRE operations. This is a senior technical role that requires a strategic mindset, deep-seated expertise in System Reliability Engineering. By blending a software engineering mindset with operational expertise, you will engineer solutions that improve system reliability, automate complex processes, and reduce manual toil. You will not only resolve the most challenging technical issues but also drive the operational strategy for SRE implementation at UMG.As a Site Reliability Engineer, you won’t just be supporting systems; you’ll be ensuring the services that connect artists and fans around the globe are always on.**Job Functions:**Key Responsibilities:* System Reliability & Performance:* – Design, build, and maintain the availability, scalability, and performance of critical services.* – Develop and maintain robust monitoring, alerting, and observability systems (e.g., using AWS CloudWatch, Dynatrace) to ensure rapid issue detection and resolution.* – Monitor infrastructure capacity and performance, providing analysis and suggestions for service delivery improvement.* Automation & Efficiency:* – Drive the automation of repetitive operational tasks, including infrastructure provisioning, deployments, and scaling.* – Create and maintain scripts and custom code to support and enhance our operational toolset.* – Support and optimize CI/CD pipelines to improve deployment speed and reliability.* Incident Management & Collaboration:* – Participate in an on-call rotation to troubleshoot and mitigate production incidents.* – Lead post-incident reviews and root cause analyses to implement lasting solutions.* – Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design and development lifecycle.* Act as the Final Escalation Point for SRE operations: Participate in resolving the most complex and critical incidents, which other teams have been unable to solve. Provide leadership during high-severity events, coordinating cross-functional teams to ensure rapid and effective resolution.* Develop Escalation Frameworks: Design, implement, and refine the escalation management process for the entire Global Technical Operations Center, ensuring that incidents are triaged, documented, and resolved efficiently.* Strategic Troubleshooting & Root Cause Analysis: Move beyond simple fixes to conduct deep-dive root cause analysis (RCA) for recurring, complex problems. Develop long-term solutions, including automation and architectural changes, to prevent future incidents.* Mentor & Uplevel the Team: Serve as a technical leader and mentor to junior engineers. Develop and lead training sessions on advanced security concepts, threat landscapes, and internal best practices to elevate the entire team’s capabilities. Foster a culture of continuous learning and operational excellence within the team. Maintain and enhance knowledge of key technologies.* Architectural Collaboration: Partner with Dev Ops and Applications architects to influence and enforce standards. Ensure that new and existing systems are built on the principles of Infrastructure as Code and toil reduction.* Automation & Optimization: Identify opportunities for network automation, scripting, and tool development to streamline operational tasks and improve efficiency.* Documentation & Standards: Create and maintain comprehensive documentation for configurations, standard operating procedures (SOPs), and incident response protocols.* Communication & Stakeholder Management: Communicate effectively with technical and non-technical stakeholders, including senior management, regarding incident status, resolution plans, and identity or security issues. Build partnerships and trust with other information technology areas, vendor technical staff, and customers in the business units.* Make UMG the place to be: Mentoring and genuinely leading the team in a way that attracts and retains the best talent. UMG is a place where everyone can bring themselves fully to work and thrive, as a Leader you are a key part of this.* Work out of standard business hours will occasionally be required.**Job Requirements:**Required Experience & Skills:* A strong background in systems administration (Linux/Windows) in a large-scale environment.* Proficiency in at least one programming language (e.g., Python, Go, Java).* Hands-on experience with a major cloud platform (AWS, GCP, or Azure), with a high preference for AWS.* Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, Ansible).* Experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, Dynatrace).* Proven analytical and problem-solving abilities with experience in a high-pressure environment.* Excellent communication skills and the ability to foster a collaborative team environment.Preferred Experience & Skills:* Bachelor’s degree in an IT-related field.* Experience managing large-scale, distributed systems for a global organization.* Familiarity with IT governance standards like ITIL.* Direct experience with ServiceNow for IT service management.* Knowledge of chaos engineering, resilience testing, and advanced capacity planning.Just So You Know…The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder’s specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive, and exhaustive statement.## **Job Category:**Universal Music Group#J-18808-Ljbffr

Sr Service Reliability Engineer – Kings Cross, London employer: Universal Music Group

At Universal Music Group, we pride ourselves on being an exceptional employer that fosters a vibrant and inclusive work culture in the heart of Kings Cross, London. Our commitment to employee growth is evident through continuous learning opportunities and mentorship, ensuring that every team member can thrive while contributing to our mission of connecting artists and fans worldwide. With a focus on innovation and collaboration, we offer a dynamic environment where diverse talents are celebrated, making UMG not just a workplace, but a community where everyone can bring their authentic selves to work.
Universal Music Group

Contact Detail:

Universal Music Group Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Sr Service Reliability Engineer – Kings Cross, London

Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.

Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects and contributions. This is a great way to demonstrate your expertise in System Reliability Engineering and make a lasting impression.

Tip Number 3

Prepare for interviews by practising common technical questions and scenarios related to SRE. Think about how you would handle real-world incidents and be ready to discuss your problem-solving approach.

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining the Universal Music family.

We think you need these skills to ace Sr Service Reliability Engineer – Kings Cross, London

System Reliability Engineering
AWS Cloud Services
Linux Administration
Windows Administration
Python
Go
Java
Docker
Kubernetes
Infrastructure as Code (Terraform, Ansible)
Monitoring Tools (Prometheus, Grafana, Datadog, Splunk, Dynatrace)
Incident Management
Root Cause Analysis
Communication Skills
Team Leadership

Some tips for your application 🫡

Show Your Passion for Music: When you're writing your application, let your love for music shine through! Mention any relevant experiences or projects that showcase your enthusiasm for the industry. We want to see how you connect with our mission at Universal Music.

Tailor Your CV and Cover Letter: Make sure to customise your CV and cover letter for the Sr Service Reliability Engineer role. Highlight your skills in system reliability and automation, and relate them directly to the responsibilities listed in the job description. This helps us see why you're a perfect fit!

Be Clear and Concise: Keep your application straightforward and to the point. Use bullet points where possible to make it easy for us to read. We appreciate clarity, so avoid jargon unless it's relevant to the role. Remember, less is often more!

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy to do – just follow the prompts and submit your details!

How to prepare for a job interview at Universal Music Group

Know Your Tech Inside Out

Make sure you brush up on your technical skills, especially in systems administration and programming languages like Python or Go. Be ready to discuss your hands-on experience with cloud platforms like AWS and tools such as Docker and Kubernetes.

Showcase Your Problem-Solving Skills

Prepare to share specific examples of how you've tackled complex issues in high-pressure environments. Think about incidents you've managed and the strategies you used for root cause analysis and resolution.

Communicate Clearly and Confidently

Since you'll be working with both technical and non-technical stakeholders, practice explaining your ideas clearly. Use simple language to describe complex concepts, and be ready to demonstrate your collaborative approach.

Emphasise Your Mentorship Experience

If you've mentored junior engineers or led training sessions, highlight these experiences. Discuss how you've fostered a culture of continuous learning and operational excellence within your team.

Sr Service Reliability Engineer – Kings Cross, London
Universal Music Group
Location: London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>