At a Glance
- Tasks: Lead operational uptime and enhance AI infrastructure with DevOps principles.
- Company: Join a vibrant team in London dedicated to trust and customer obsession.
- Benefits: Enjoy a dynamic work environment with opportunities for growth and collaboration.
- Why this job: Be part of a culture that values agility and continuous improvement while making an impact.
- Qualifications: 5+ years in DevOps/SRE, strong Linux and networking skills, proficient in Python.
- Other info: Located near Waterloo and Blackfriars, perfect for commuting.
The predicted salary is between 48000 - 72000 £ per year.
About Us:
We love going to work and think you should too. Our team is dedicated to trust, customer obsession, agility, and striving to be better every day. These values serve as the foundation of our culture, guiding our actions and driving us towards excellence.
This position is located in London, England. Our office is situated in a core location near Waterloo and Blackfriars on the Southbank.
What You'll Do:
- This role will take a lead in the operational uptime and continued expansion of LM Edwin AI infrastructure by serving as a facilitator of operational excellence.
- Responsibilities include designing and implementing new production deployments of SOA-based software across cloud datacentres, as well as providing guidance on organizing, securing and automating existing infrastructure and deployments.
- Maintain uptime of LogicMonitor's (Edwin AI) SaaS-based service and drive technical/process enhancements to improve uptime.
- Lead efforts to design and implement resilient IT applications using DevOps and SRE principles.
- Deploy production applications and drive improvements to the deployment process.
- Monitor system performance and troubleshoot issues to ensure high availability and reliability.
- Design and deploy new application components.
- Design and deploy new infrastructure components and integrations.
- Ensure security of the production environment.
- Develop and implement automated disaster recovery processes to minimise system downtime.
- Identify opportunities for improvement in system performance, deployment speed, and scalability.
- Write high-quality code to automate various aspects of infrastructure maintenance and deployment.
- Support engineering and work closely with engineers to drive operational and architectural/design changes.
- Own, manage, and execute multiple large and technically complex projects across teams.
- Provide direct technical guidance to help team members achieve goals and improve their productivity.
- Participate in the recruitment and hiring of new engineers.
What You'll Need:
- 5+ years as a DevOps Engineer or SRE with designing and implementing resilient IT applications using DevOps and SRE principles.
- Good understanding of Linux system administration and 3+ years of hands-on experience.
- Good understanding of networking technologies.
- Experience building IaC automations using Terraform.
- Production experience of containers and container orchestration tools (Docker/Kubernetes).
- Good understanding of Amazon Web Services.
- Experience of designing/implementing CI/CD pipelines including production deployments.
- Experience building and working with logging and metrics solutions such as Prometheus.
- Experience programming with RESTful web services.
- Proficient Python developer.
- Well-versed in security principles, both systems and network.
- Excellent written and verbal communications skills with a track record of improving documentation and processes.
- Experience in carrying out complex problem determination and Root Cause Analysis across complex distributed systems.
Senior Site Reliability Engineer - DevOps employer: Logicmonitor
Contact Detail:
Logicmonitor Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Senior Site Reliability Engineer - DevOps
✨Tip Number 1
Familiarise yourself with the specific technologies mentioned in the job description, such as Terraform, Docker, and AWS. Having hands-on experience or projects that showcase your skills with these tools can set you apart during discussions.
✨Tip Number 2
Network with current or former employees of StudySmarter or similar companies. Engaging with them on platforms like LinkedIn can provide insights into the company culture and expectations, which can be invaluable during interviews.
✨Tip Number 3
Prepare to discuss your past experiences in detail, especially those related to operational uptime and system performance. Be ready to share specific examples of how you've implemented DevOps principles to solve complex problems.
✨Tip Number 4
Showcase your leadership skills by discussing any previous roles where you led projects or teams. Highlighting your ability to guide others and improve processes will resonate well with the responsibilities outlined in the job description.
We think you need these skills to ace Senior Site Reliability Engineer - DevOps
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights your experience as a DevOps Engineer or SRE, particularly focusing on your skills in designing resilient IT applications and your hands-on experience with Linux system administration. Use keywords from the job description to align your experience with what the company is looking for.
Craft a Compelling Cover Letter: In your cover letter, express your passion for operational excellence and how your background aligns with the company's values of trust and customer obsession. Mention specific projects where you've implemented DevOps principles and improved system performance.
Showcase Relevant Projects: Include examples of projects where you have successfully deployed production applications, built CI/CD pipelines, or automated infrastructure maintenance. Highlight your experience with tools like Terraform, Docker, and Kubernetes, as well as any contributions to logging and metrics solutions.
Proofread and Edit: Before submitting your application, carefully proofread your documents to eliminate any spelling or grammatical errors. Ensure that your writing is clear and concise, reflecting your excellent communication skills, which are essential for this role.
How to prepare for a job interview at Logicmonitor
✨Showcase Your Technical Expertise
Be prepared to discuss your experience with DevOps and SRE principles in detail. Highlight specific projects where you've designed and implemented resilient IT applications, and be ready to explain the technologies you used, such as Docker, Kubernetes, and Terraform.
✨Demonstrate Problem-Solving Skills
Expect questions that assess your ability to troubleshoot complex issues. Prepare examples of past challenges you've faced in system performance or deployment processes, and explain how you approached and resolved them.
✨Communicate Clearly
Since excellent communication skills are essential for this role, practice articulating your thoughts clearly and concisely. Be ready to discuss how you've improved documentation and processes in previous roles, as well as how you collaborate with engineering teams.
✨Align with Company Values
Research the company's culture and values, particularly their focus on trust, customer obsession, and agility. During the interview, express how your personal values align with theirs and provide examples of how you've embodied these principles in your work.