Site Reliability Engineer
Site Reliability Engineer

Site Reliability Engineer

Ilkley Full-Time 42000 - 84000 £ / year (est.) No home office possible
S

At a Glance

  • Tasks: Ensure reliability and performance of cloud infrastructure while automating operations.
  • Company: Join SmartSearch, a multi-award winning tech company with a collaborative culture.
  • Benefits: Enjoy 25 days holiday, private medical insurance, and a cycle to work scheme.
  • Why this job: Be part of an exciting growth journey in a diverse and inclusive team.
  • Qualifications: Experience with SRE principles and tools like Grafana and Kubernetes is essential.
  • Other info: Occasional office attendance required; great progression opportunities await!

The predicted salary is between 42000 - 84000 £ per year.

SmartSearch’s distinctive Anti-Money Laundering verification software protects our clients by offering the most advanced and comprehensive features available from an AML provider. SmartSearch has grown rapidly by fostering an incredibly collaborative and supportive culture. As we continue our ambitious growth plans, we will strive to remain a truly exciting, rewarding, and unique place to work.

We are looking for a Site Reliability Engineer (SRE) who will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure and applications. This role focuses on maintaining and improving system observability, automating operations, and enhancing deployment practices to support business-critical services. Reporting directly to the Lead Site Reliability Engineer, you will be expected to work independently while collaborating closely with engineering and operations teams. You will be responsible for implementing and maintaining monitoring and logging solutions while producing clear documentation to support the cloud environment. Continuous learning and improving performance based on set targets will be expected.

Please note, you will be required to be within commutable distance to the Ilkley office for occasional office attendance.

VARIED DAY TO DAY RESPONSIBILITIES
  • Ensuring system reliability, performance, and scalability through monitoring and automation
  • Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry
  • Proactively identifying and resolving performance bottlenecks and infrastructure issues
  • Automating infrastructure provisioning, configuration management, and deployments
  • Implementing effective logging, monitoring, and alerting strategies
  • Managing incident response and post-mortem processes to improve system resilience
  • Implementing high-availability and fault-tolerant solutions
  • Working with DevOps engineers to streamline CI/CD pipelines and automate testing
  • Providing detailed documentation for cloud infrastructure, deployment processes, and best practices
  • Actively participating in capacity planning and cloud architectural decisions
  • Continuously improving infrastructure reliability and operational efficiency
WHAT ARE WE LOOKING FOR IN A CANDIDATE?
  • Experience with SRE principles, such as incident management, error budgets, and service-level objectives (SLOs)
  • Experience designing and implementing robust observability, monitoring and logging solutions
  • Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki
  • Strong experience with distributed tracing and telemetry tools such as OpenTelemetry
  • An understanding of cloud networking architecture and load balancing techniques
  • Experience with container orchestration platforms like Kubernetes
  • Proficiency in infrastructure as code (IaC) tools such as Terraform or Ansible
  • Strong experience in cloud platforms, particularly Azure
  • Ability to comply with technical and security best practices
  • Good written and verbal communication skills, with a strong standard of English
  • Desire to continuously learn and stay updated with technology advancements
Advantages
  • Several years’ experience in an SRE, DevOps, or similar role
  • Knowledge of application performance monitoring solutions like DataDog or NewRelic
  • Hands-on experience with DevOps practices, including CI/CD pipelines and automated deployments
  • Understanding of software development, ideally with PHP
  • Understanding of microservices architecture and distributed systems
  • Working knowledge of GitOps workflows and Kubernetes Operators
  • Strong automation and scripting abilities with Python, Bash, or Go
  • Experience managing cloud-native applications in production environments
  • Proficiency in capacity planning and performance optimization
  • Experience in managing and improving CI/CD pipelines
  • Knowledge of incident response best practices and on-call operations
WHAT IS LIFE LIKE AT SMARTSEARCH?

We are a multi-award winning Tech company with an aspirational mentality. Some of our most recent recognitions include: named in the renowned RegTech100 list for 2024, listed in the Top 100 Fastest Growing Tech Companies by Northern Tech Awards 2024 as well as being named Technology Provider of the Year by Corporate Finance Awards 2024. We have been Great Place To Work Certified since 2022. There are excellent progression opportunities due to our growth and you will have personal development goals, regular feedback and support. We are a diverse and inclusive team committed to promoting Diversity & Inclusion and Social Responsibility. Through our DE&I group, charitable initiatives and support for local schools, we actively foster a positive impact on our community.

COMPANY BENEFITS

Our comprehensive benefit package includes:

  • 25 days holiday rising to 30 with each year of service
  • Private Medical Insurance covering dental and optical
  • Company pension scheme
  • Life Assurance – 4x your annual salary
  • 1 day paid volunteering per year
  • Enhanced maternity / paternity offerings
  • Employee Assistance Programme
  • Cycle to work scheme
  • On site gym

Site Reliability Engineer employer: SmartSearch

SmartSearch is an exceptional employer, renowned for its collaborative and supportive culture that fosters both personal and professional growth. Located in Ilkley, the company offers a range of benefits including generous holiday allowances, private medical insurance, and a commitment to diversity and inclusion, making it a truly rewarding place to work. With a focus on continuous learning and development, employees are empowered to make a meaningful impact while enjoying a vibrant work environment.
S

Contact Detail:

SmartSearch Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer

✨Tip Number 1

Familiarise yourself with the specific tools mentioned in the job description, such as Grafana, Prometheus, and OpenTelemetry. Having hands-on experience or projects showcasing your skills with these tools can set you apart during discussions.

✨Tip Number 2

Demonstrate your understanding of SRE principles by preparing examples of how you've managed incidents or improved system reliability in previous roles. This will show that you not only understand the theory but have practical experience.

✨Tip Number 3

Engage with the community around Site Reliability Engineering. Join forums, attend meetups, or participate in online discussions to stay updated on best practices and trends. This can also provide you with valuable networking opportunities.

✨Tip Number 4

Prepare to discuss your experience with cloud platforms, particularly Azure, and be ready to explain how you've implemented high-availability solutions. Tailoring your conversation to align with SmartSearch's needs will demonstrate your fit for the role.

We think you need these skills to ace Site Reliability Engineer

Site Reliability Engineering (SRE) principles
Incident management
Error budgets
Service-level objectives (SLOs)
Observability and monitoring tools (Grafana, Prometheus, Loki)
Distributed tracing and telemetry (OpenTelemetry)
Cloud networking architecture
Load balancing techniques
Container orchestration (Kubernetes)
Infrastructure as Code (IaC) tools (Terraform, Ansible)
Cloud platforms (particularly Azure)
Technical and security best practices compliance
Written and verbal communication skills
Continuous learning mindset
Application performance monitoring solutions (DataDog, NewRelic)
DevOps practices (CI/CD pipelines, automated deployments)
Software development understanding (ideally PHP)
Microservices architecture and distributed systems knowledge
GitOps workflows and Kubernetes Operators
Automation and scripting abilities (Python, Bash, Go)
Managing cloud-native applications in production environments
Capacity planning and performance optimization
Improving CI/CD pipelines
Incident response best practices

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights relevant experience and skills that align with the Site Reliability Engineer role. Focus on your proficiency with tools like Grafana, Prometheus, and your experience with cloud platforms, particularly Azure.

Craft a Compelling Cover Letter: In your cover letter, express your enthusiasm for SmartSearch and their mission. Mention specific projects or experiences that demonstrate your ability to ensure system reliability and performance, as well as your commitment to continuous learning.

Showcase Your Technical Skills: When detailing your technical skills, be specific about your experience with incident management, automation, and infrastructure as code tools like Terraform or Ansible. Use examples to illustrate how you've applied these skills in previous roles.

Highlight Soft Skills: Don't forget to mention your communication skills and ability to work collaboratively with engineering and operations teams. Provide examples of how you've successfully worked in a team environment to solve complex problems.

How to prepare for a job interview at SmartSearch

✨Showcase Your SRE Knowledge

Make sure to highlight your understanding of Site Reliability Engineering principles, such as incident management and service-level objectives (SLOs). Be prepared to discuss how you've applied these concepts in previous roles.

✨Demonstrate Technical Proficiency

Familiarise yourself with the tools mentioned in the job description, like Grafana, Prometheus, and OpenTelemetry. Be ready to provide examples of how you've used these tools to enhance system observability and performance.

✨Prepare for Scenario-Based Questions

Expect questions that assess your problem-solving skills in real-world scenarios. Think about past experiences where you identified and resolved performance bottlenecks or managed incident responses, and be ready to share those stories.

✨Emphasise Continuous Learning

SmartSearch values continuous improvement, so express your desire to learn and stay updated with technology advancements. Share any recent courses, certifications, or projects that demonstrate your commitment to professional growth.

Site Reliability Engineer
SmartSearch
S
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>