Site Reliability Engineer

Site Reliability Engineer

City of London Full-Time 43200 - 72000 £ / year (est.) No home office possible
S

At a Glance

  • Tasks: Join a team to enhance system reliability and performance for national security.
  • Company: Be part of a growing organisation focused on critical mission support.
  • Benefits: Enjoy competitive pay, overtime options, and a collaborative work environment.
  • Why this job: Make a real impact while learning cutting-edge technologies in a supportive culture.
  • Qualifications: Experience in software development, databases, and system monitoring is essential.
  • Other info: Full-time role in central London with a 24/7 on-call rota.

The predicted salary is between 43200 - 72000 £ per year.

In this role you’ll be at the forefront of delivering enhanced reliability, performance, and quality to a key national security customer. Joining a growing team, you’ll help create a culture of continuous improvement and play a pivotal role in revolutionising how systems are developed and supported. This role combines operational support with software engineering, allowing you to design tools and applications that monitor and improve system health. As part of a wider programme, you’ll be integral to supporting the customer’s critical mission.

Key Responsibilities:

  • Support and maintain critical services, enhancing the availability, performance, and stability of core mission applications.
  • Participate in the 24/7 on-call rota (one week in 5 with overtime rate TBC), supporting production systems outside business hours, with additional on-call allowances and overtime benefits.
  • Focus on automation to reduce manual operations work (e.g. incident tickets, on-call) to improve efficiency.
  • Collaborate with development teams, advising on best practices for system design and implementation.
  • Design and deploy monitoring tools to provide intelligent insights into system health, customising tools where necessary.
  • Understand the relationship between software and infrastructure, ensuring systems are scalable and resilient to failure.
  • Participate in the wider DevOps/SRE community, sharing knowledge and best practices across the organisation.

Key Skills & Experience:

  • Experience or enthusiasm for software development in web technologies and object-oriented programming.
  • Familiarity with database technologies such as Oracle SQL, MongoDB, or Postgres.
  • Proficiency with Linux and Windows command lines (e.g. Bash, PowerShell).
  • Experience with monitoring large systems using tools like Grafana, Prometheus, ELK, and Splunk.
  • Knowledge of Agile methodologies and tools like Atlassian.
  • Strong troubleshooting skills across various levels of the application stack.
  • Familiarity with ITIL processes.
  • Experience with microservices architectures and container platforms like Docker, Kubernetes, and OpenShift.
  • A passion for learning new technologies and solving complex problems.
  • Awareness of emerging tech trends and tools in the SRE space.

Interested in this role? Please apply directly to this advert with an updated CV to be considered for the role.

S

Contact Detail:

Stott and May Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer

✨Tip Number 1

Familiarise yourself with the specific tools mentioned in the job description, such as Grafana, Prometheus, and ELK. Having hands-on experience or even personal projects showcasing these tools can set you apart during discussions.

✨Tip Number 2

Engage with the DevOps and SRE community online. Join forums, attend webinars, or participate in local meetups to network with professionals in the field. This can provide insights into industry trends and may lead to valuable connections.

✨Tip Number 3

Prepare to discuss your troubleshooting skills in detail. Be ready to share specific examples of how you've resolved complex issues in previous roles, particularly those involving system performance and stability.

✨Tip Number 4

Showcase your passion for continuous learning by mentioning any recent courses or certifications related to SRE or DevOps. This demonstrates your commitment to staying updated with emerging technologies and best practices.

We think you need these skills to ace Site Reliability Engineer

Site Reliability Engineering
DevOps Practices
Software Development in Web Technologies
Object-Oriented Programming
Database Technologies (Oracle SQL, MongoDB, Postgres)
Linux Command Line Proficiency
Windows Command Line Proficiency (Bash, PowerShell)
Monitoring Tools (Grafana, Prometheus, ELK, Splunk)
Agile Methodologies
Atlassian Tools
Troubleshooting Skills
ITIL Processes
Microservices Architectures
Container Platforms (Docker, Kubernetes, OpenShift)
Automation Skills
Continuous Improvement Mindset
Collaboration and Communication Skills
Passion for Learning New Technologies
Understanding of System Scalability and Resilience

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights relevant experience and skills that align with the Site Reliability Engineer role. Focus on your software development experience, familiarity with database technologies, and any previous work with monitoring tools.

Craft a Strong Cover Letter: Write a cover letter that showcases your enthusiasm for the role and the company. Mention specific projects or experiences that demonstrate your ability to enhance system reliability and performance, as well as your passion for continuous improvement.

Highlight Relevant Skills: In your application, emphasise your proficiency with Linux and Windows command lines, experience with monitoring large systems, and knowledge of Agile methodologies. These are key skills that the employer is looking for.

Showcase Problem-Solving Abilities: Provide examples in your application that illustrate your strong troubleshooting skills and your ability to solve complex problems. This will help demonstrate your fit for the role and your readiness to tackle challenges in a critical mission environment.

How to prepare for a job interview at Stott and May

✨Showcase Your Technical Skills

Be prepared to discuss your experience with web technologies, object-oriented programming, and database technologies. Highlight specific projects where you've used tools like Grafana or Prometheus to monitor systems.

✨Demonstrate Problem-Solving Abilities

Expect questions that assess your troubleshooting skills across the application stack. Prepare examples of complex problems you've solved in previous roles, particularly in high-pressure situations.

✨Emphasise Collaboration and Communication

Since this role involves working closely with development teams, be ready to discuss how you've collaborated in the past. Share experiences where you advised on best practices for system design and implementation.

✨Express Your Passion for Continuous Improvement

This position focuses on enhancing reliability and performance. Talk about your enthusiasm for automation and continuous improvement, and provide examples of how you've implemented these principles in your work.

Site Reliability Engineer
Stott and May
S
  • Site Reliability Engineer

    City of London
    Full-Time
    43200 - 72000 £ / year (est.)

    Application deadline: 2027-06-22

  • S

    Stott and May

Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>