Sr Service Reliability Engineer in London

Job Board

Companies

Dormont Manufacturing Co

Sr Service Reliability Engineer

Sr Service Reliability Engineer in London

London Full-Time 60000 - 80000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Design and maintain reliable systems that keep music services running smoothly.
Company: Join Universal Music, the world's leading music company with a vibrant culture.
Benefits: Enjoy competitive pay, health perks, and opportunities for remote work.
Other info: Be part of a diverse team that values innovation and continuous learning.
Why this job: Make a real impact in the music industry while working with cutting-edge technology.
Qualifications: Experience in systems administration and programming, with a passion for problem-solving.

The predicted salary is between 60000 - 80000 £ per year.

Music is Universal. It’s the passionate and dedicated team at Universal Music who help make us the world’s leading music company. From A&R to finance, legal to digital, sales to marketing, Universal Music is the place to grow and develop your career within a truly commercial and innovative business that leads in everything it does. Everyone is welcome to apply for our roles, and we are determined to ensure that no applicant or employee receives less favourable treatment because of gender, race, disability, sexual orientation, religion, belief, age, marital status, background, pregnancy, or caring responsibilities. We also recognise the importance of diversity of thought within our teams and are fully committed to embracing the talents of people with autism, dyslexia, ADHD, and other forms of neurocognitive variation. We will always seek to make appropriate adjustments to recruitment, workplaces, and work processes to be fully inclusive to people with different needs and working styles.

Job Summary: We are UMG, the Universal Music Group. We are the world’s leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world.

As a key member of our Global Technical Operations team, you will be the ultimate escalation point and subject matter expert for all SRE operations. This is a senior technical role that requires a strategic mindset, deep‑seeded expertise in System Reliability Engineering. By blending a software engineering mindset with operational expertise, you will engineer solutions that improve system reliability, automate complex processes, and reduce manual toil. You will not only resolve the most challenging technical issues but also drive the operational strategy for SRE implementation at UMG. As a Site Reliability Engineer, you won’t just be supporting systems; you’ll be ensuring the services that connect artists and fans around the globe are always on.

Key Responsibilities:

System Reliability & Performance: Design, build, and maintain the availability, scalability, and performance of critical services. Develop and maintain robust monitoring, alerting, and observability systems (e.g., using AWS CloudWatch, Dynatrace) to ensure rapid issue detection and resolution. Monitor infrastructure capacity and performance, providing analysis and suggestions for service delivery improvement.
Automation & Efficiency: Drive the automation of repetitive operational tasks, including infrastructure provisioning, deployments, and scaling. Create and maintain scripts and custom code to support and enhance our operational toolset. Support and optimize CI/CD pipelines to improve deployment speed and reliability.
Incident Management & Collaboration: Participate in an on‑call rotation to troubleshoot and mitigate production incidents. Lead post‑incident reviews and root cause analyses to implement lasting solutions. Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design and development lifecycle.
Act as the Final Escalation Point for SRE operations: Participate in resolving the most complex and critical incidents that other teams have been unable to solve. Provide leadership during high‑severity events, coordinating cross‑functional teams to ensure rapid and effective resolution. Develop Escalation Frameworks: Design, implement, and refine the escalation management process for the entire Global Technical Operations Center, ensuring that incidents are triaged, documented, and resolved efficiently.
Strategic Troubleshooting & Root Cause Analysis: Move beyond simple fixes to conduct deep‑dive root cause analysis (RCA) for recurring, complex problems. Develop long‑term solutions, including automation and architectural changes, to prevent future incidents.
Mentor & Uplevel the Team: Serve as a technical leader and mentor to junior engineers. Develop and lead training sessions on advanced security concepts, threat landscapes, and internal best practices to elevate the entire team’s capabilities. Foster a culture of continuous learning and operational excellence within the team. Maintain and enhance knowledge of key technologies.
Architectural Collaboration: Partner with Dev Ops and Applications architects to influence and enforce standards. Ensure that new and existing systems are built on the principles of Infrastructure as Code and toil reduction.
Automation & Optimization: Identify opportunities for network automation, scripting, and tool development to streamline operational tasks and improve efficiency.
Documentation & Standards: Create and maintain comprehensive documentation for configurations, standard operating procedures (SOPs), and incident response protocols.
Communication & Stakeholder Management: Communicate effectively with technical and non‑technical stakeholders, including senior management, regarding incident status, resolution plans, and identity or security issues. Build partnerships and trust with other information technology areas, vendor technical staff, and customers in the business units.
Make UMG the place to be: Mentoring and genuinely leading the team in a way that attracts and retains the best talent. UMG is a place where everyone can bring themselves fully to work and thrive, as a Leader you are a key part of this.

Work out of standard business hours will occasionally be required.

Job Requirements:

A strong background in systems administration (Linux/Windows) in a large‑scale environment.
Proficiency in at least one programming language (e.g., Python, Go, Java).
Hands‑on experience with a major cloud platform (AWS, GCP, or Azure), with a high preference for AWS.
Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, Ansible).
Experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, Dynatrace).
Proven analytical and problem‑solving abilities with experience in a high‑pressure environment.
Excellent communication skills and the ability to foster a collaborative team environment.

Preferred Experience & Skills:

Bachelor’s degree in an IT‑related field.
Experience managing large‑scale, distributed systems for a global organization.
Familiarity with IT governance standards like ITIL.
Direct experience with ServiceNow for IT service management.
Knowledge of chaos engineering, resilience testing, and advanced capacity planning.

Just So You Know… The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder’s specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive, and exhaustive statement.

Sr Service Reliability Engineer in London employer: Dormont Manufacturing Co

At Universal Music Group, we pride ourselves on being an exceptional employer that champions creativity and innovation in the music industry. Our inclusive work culture fosters diversity and encourages personal growth, offering employees ample opportunities for professional development and mentorship. Located in a vibrant environment, we provide a dynamic workplace where your contributions directly impact the global music landscape, making it a truly rewarding place to build your career.

Contact Details:

Dormont Manufacturing Co Recruitment Team

View Dormont Manufacturing Co profile

StudySmarter Expert Advice🤫

We think this is how you could land Sr Service Reliability Engineer in London

✨Join the IT Consultancy Buzz

Get involved in local or virtual IT consultancy meetups and forums. This is where we can rub shoulders with industry professionals, get insights into what Dormont Manufacturing Co values, and even spot unadvertised opportunities. Don't miss out on these chances to make a name for ourselves in the IT world!

✨Show Off Your Skills

Create a personal project or case study relevant to the challenges Dormont Manufacturing Co might face. Use platforms like GitHub or Medium to share your findings. This not only demonstrates our consulting skills but shows a proactive attitude, making us stand out from the crowd when applying for that full-time gig.

✨Leverage LinkedIn for Connections

Follow and engage with the relevant thought leaders and influencers in IT consultancy on LinkedIn. Share insightful content and join discussions to gain visibility. A well-placed comment or shared article could catch the attention of someone at Dormont Manufacturing Co!

✨Direct Apply to Dormont Manufacturing Co

Let's not forget to apply directly through the Dormont Manufacturing Co website! Tailor your application to showcase our understanding of their consulting style and how we can contribute to their projects. A personalised approach can make a huge difference in landing that full-time position!

We think you need these skills to ace Sr Service Reliability Engineer in London

System Reliability Engineering

AWS CloudWatch

Dynatrace

Infrastructure as Code

Python

Java

Linux

Windows

Docker

Kubernetes

Terraform

Ansible

Prometheus

Grafana

Datadog

Splunk

Analytical Skills

Problem-Solving Skills

Communication Skills

Collaboration

Some tips for your application 🫡

Showcase Your Problem-Solving Skills:In IT consulting, it's all about problem-solving, so make sure your CV highlights your analytical skills and any relevant projects you've tackled. Mention specific technologies or methodologies you've used to resolve issues or improve processes; this shows you can think critically and deliver results, which is vital for us at Dormont Manufacturing Co.

Highlight Relevant Certifications:Certifications like ITIL, PMP, or even specific tech stack qualifications can really make you stand out. Make sure to include these in your CV, as they not only demonstrate your expertise but also your commitment to staying current in the field. We love seeing candidates who are proactive about their professional development!

Tailor Your Cover Letter:Your cover letter is your chance to connect personally with us at Dormont Manufacturing Co. Share stories about your experiences in IT consulting, and how they shaped your desire to join our team. Mention why you’re excited about this particular role, and how you see yourself contributing to our projects.

Keep It Clear and Concise:We're all busy, so make sure your application is easy to read. Use bullet points for key achievements, and don’t overload us with jargon. A clean, professional layout goes a long way. Remember, the clearer your application, the more likely we are to invite you in for an interview!

How to prepare for a job interview at Dormont Manufacturing Co

✨Brush Up on Your Technical Skills

For an IT consulting role, be ready to demonstrate your technical prowess. You might face questions on systems integration, cloud technologies, or even troubleshooting specific software. If you have experience with tools like AWS, Azure, or even specific programming languages, make sure you can talk about them fluently.

✨Showcase Your Problem-Solving Approach

IT consulting is all about solving problems for clients. Think about how you can illustrate your approach to a past challenge using the STAR method (Situation, Task, Action, Result). It's a great way to show how you tackle complex issues and come up with effective solutions.

✨Know the Business Impact of IT Solutions

When discussing your experiences, focus not just on the tech solutions you implemented, but also on their business impact. Employers want to see that you can connect IT with organisational goals. Prep examples that highlight how your tech contributions improved efficiency or reduced costs for past clients or projects.

✨Prepare for Behavioural Questions

Since IT consulting often involves teamwork and client interactions, expect behavioural questions that assess your interpersonal skills. Be prepared with examples that demonstrate your adaptability, communication skills, and how you handle client feedback. Before the interview, think of situations where you worked closely with clients to create effective IT strategies or changes.

Sr Service Reliability Engineer in London

Dormont Manufacturing Co

Location: London

Apply Now

Sr Service Reliability Engineer in London

At a Glance

Sr Service Reliability Engineer in London employer: Dormont Manufacturing Co

StudySmarter Expert Advice🤫

We think you need these skills to ace Sr Service Reliability Engineer in London

Some tips for your application 🫡

How to prepare for a job interview at Dormont Manufacturing Co

Company

Product

Help