Senior Site Reliability Engineer - DevOps
Senior Site Reliability Engineer - DevOps

Senior Site Reliability Engineer - DevOps

London Full-Time 48000 - 72000 £ / year (est.) No home office possible
L

At a Glance

  • Tasks: Lead operational uptime and enhance LM Edwin AI infrastructure with DevOps principles.
  • Company: Join a vibrant team in London focused on trust, agility, and excellence.
  • Benefits: Enjoy a collaborative workspace, recognition culture, and opportunities for personal growth.
  • Why this job: Be part of a dynamic environment where your contributions directly impact performance and innovation.
  • Qualifications: 5+ years in DevOps/SRE, strong Linux and networking skills, proficient in Python and Terraform.
  • Other info: Located near Waterloo, our office promotes creativity and connection among teams.

The predicted salary is between 48000 - 72000 £ per year.

Senior Site Reliability Engineer – DevOps

Artificial Intelligence, London, UK

About Us:

We love going to work and think you should too. Our team is dedicated to trust, customer obsession, agility, and striving to be better every day. These values serve as the foundation of our culture, guiding our actions and driving us towards excellence. We foster a culture of performance and recognition, allowing us to transform growth as we enable our employees to do the best work of their careers.

This position is located in London, England. Our office is situated in a core location near Waterloo and Blackfriars on the Southbank. Across the globe, our Centres of Energy serve as hubs where we accelerate productivity and collaboration, inspire creativity, and cultivate a culture of connection and celebration. Our teams coordinate their time in Centres of Energy to reflect how they work best.

What You\’ll Do:

This role will take a lead in the operational uptime and continued expansion of LM Edwin AI infrastructure by serving as a facilitator of operational excellence. Responsibilities include designing and implementing new production deployments of SOA-based software across cloud datacentres, as well as providing guidance on organizing, securing, and automating existing infrastructure and deployments. This position involves working with developers and providing feedback to drive operational performance improvements within the LM platform and operations infrastructure.

Here\’s a closer look at this key role:

  • Maintain uptime of LogicMonitor\’s (Edwin AI) SaaS-based service and drive technical/process enhancements to improve uptime.
  • Lead efforts to design and implement resilient IT applications using DevOps and SRE principles.
  • Deploy production applications and drive improvements to the deployment process.
  • Monitor system performance and troubleshoot issues to ensure high availability and reliability.
  • Design and deploy new application components.
  • Design and deploy new infrastructure components and integrations.
  • Ensure security of the production environment.
  • Develop and implement automated disaster recovery processes to minimise system downtime.
  • Identify opportunities for improvement in system performance, deployment speed, and scalability.
  • Write high-quality code to automate various aspects of infrastructure maintenance and deployment.
  • Support engineering and work closely with engineers to drive operational and architectural/design changes.
  • Own, manage, and execute multiple large and technically complex projects across teams.
  • Providing alignment between business objectives and the team\’s pursuit of technology improvements.
  • Contribute to remediation actions relating to service disruptions and outages.
  • Provide direct technical guidance to help team members achieve goals and improve their productivity.
  • Participate in the recruitment and hiring of new engineers.

What You\’ll Need:

  • 5+ years as a DevOps Engineer or SRE with designing and implementing resilient IT applications using DevOps and SRE principles.
  • Good understanding of Linux system administration and 3+ years of hands-on experience.
  • Good understanding of networking technologies.
  • Experience building IaC automations using Terraform.
  • Production experience of containers and container orchestration tools (Docker/Kubernetes).
  • Good understanding of Amazon Web Services.
  • Experience of designing/implementing CI/CD pipelines including production deployments.
  • Experience building and working with logging and metrics solutions such as Prometheus.
  • Experience programming with RESTful web services.
  • Proficient Python developer.
  • Well-versed in security principles, both systems and network.
  • Excellent written and verbal communications skills with a track record of improving documentation and processes.
  • Experience in carrying out complex problem determination and Root Cause Analysis across complex distributed systems.

#J-18808-Ljbffr

Senior Site Reliability Engineer - DevOps employer: Logicmonitor

At our company, we believe that work should be enjoyable and fulfilling. Located in the vibrant heart of London, our office fosters a collaborative and innovative environment where employees are recognized for their contributions and encouraged to grow. With a strong focus on performance, employee development, and a culture that celebrates creativity and connection, we provide an exceptional workplace for Senior Site Reliability Engineers looking to make a meaningful impact.
L

Contact Detail:

Logicmonitor Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Senior Site Reliability Engineer - DevOps

✨Tip Number 1

Familiarize yourself with the specific technologies mentioned in the job description, such as Terraform, Docker, and Kubernetes. Having hands-on experience with these tools will not only boost your confidence but also demonstrate your capability to handle the responsibilities of the role.

✨Tip Number 2

Showcase your problem-solving skills by preparing examples of complex issues you've resolved in previous roles. Be ready to discuss your approach to Root Cause Analysis and how you improved system performance or uptime in those situations.

✨Tip Number 3

Highlight your experience with CI/CD pipelines and how you've implemented them in past projects. Being able to articulate the impact of these implementations on deployment speed and reliability will set you apart from other candidates.

✨Tip Number 4

Prepare to discuss your communication skills and how you've collaborated with cross-functional teams. This role emphasizes teamwork, so sharing specific instances where you've provided technical guidance or improved documentation will resonate well with the hiring team.

We think you need these skills to ace Senior Site Reliability Engineer - DevOps

DevOps Principles
Site Reliability Engineering (SRE)
Linux System Administration
Networking Technologies
Infrastructure as Code (IaC) using Terraform
Containerization (Docker/Kubernetes)
Amazon Web Services (AWS)
CI/CD Pipeline Design and Implementation
Logging and Metrics Solutions (Prometheus)
RESTful Web Services Programming
Python Development
Security Principles (Systems and Network)
Technical Documentation Improvement
Complex Problem Determination
Root Cause Analysis

Some tips for your application 🫡

Understand the Role: Make sure you thoroughly understand the responsibilities and requirements of the Senior Site Reliability Engineer position. Tailor your application to highlight your relevant experience in DevOps and SRE principles.

Highlight Relevant Experience: In your CV and cover letter, emphasize your 5+ years of experience as a DevOps Engineer or SRE. Include specific examples of how you've designed and implemented resilient IT applications, and mention any relevant technologies like Terraform, Docker, and AWS.

Showcase Your Skills: Demonstrate your proficiency in Python and your understanding of networking technologies. Mention any experience with CI/CD pipelines and logging solutions like Prometheus, as these are crucial for the role.

Craft a Compelling Cover Letter: Write a cover letter that reflects your passion for operational excellence and your commitment to improving system performance. Use this opportunity to convey your communication skills and your ability to work collaboratively with engineering teams.

How to prepare for a job interview at Logicmonitor

✨Showcase Your Technical Expertise

Be prepared to discuss your experience with DevOps and SRE principles in detail. Highlight specific projects where you designed and implemented resilient IT applications, and be ready to explain the technologies you used, such as Terraform, Docker, and AWS.

✨Demonstrate Problem-Solving Skills

Expect questions that assess your ability to troubleshoot complex issues. Prepare examples of past challenges you've faced in system performance or deployment processes, and explain how you approached and resolved them.

✨Communicate Clearly and Effectively

Since excellent communication skills are crucial for this role, practice articulating your thoughts clearly. Be ready to explain technical concepts in a way that is understandable to non-technical stakeholders, showcasing your ability to bridge the gap between teams.

✨Align with Company Values

Research the company's culture and values, such as trust and customer obsession. During the interview, express how your personal values align with theirs and provide examples of how you've embodied these values in your previous roles.

Senior Site Reliability Engineer - DevOps
Logicmonitor
L
  • Senior Site Reliability Engineer - DevOps

    London
    Full-Time
    48000 - 72000 £ / year (est.)

    Application deadline: 2027-04-08

  • L

    Logicmonitor

Similar positions in other companies
Europas größte Jobbörse für Gen-Z
discover-jobs-cta
Discover now
>