Job Board

Companies

SmartSearch

Site Reliability Engineer

Ilkley Full-Time 42000 - 84000 £ / year (est.) No home office possible

Apply now

At a Glance

Tasks: Ensure reliability and performance of cloud infrastructure while automating operations.
Company: Join SmartSearch, a multi-award winning tech company with a collaborative culture.
Benefits: Enjoy 25 days holiday, private medical insurance, and a cycle to work scheme.
Why this job: Be part of an exciting growth journey in a diverse and inclusive team.
Qualifications: Experience with SRE principles and tools like Grafana and Kubernetes is essential.
Other info: Occasional office attendance required; great progression opportunities await!

The predicted salary is between 42000 - 84000 £ per year.

SmartSearch’s distinctive Anti-Money Laundering verification software protects our clients by offering the most advanced and comprehensive features available from an AML provider. SmartSearch has grown rapidly by fostering an incredibly collaborative and supportive culture. As we continue our ambitious growth plans, we will strive to remain a truly exciting, rewarding, and unique place to work.

We are looking for a Site Reliability Engineer (SRE) who will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure and applications. This role focuses on maintaining and improving system observability, automating operations, and enhancing deployment practices to support business-critical services. Reporting directly to the Lead Site Reliability Engineer, you will be expected to work independently while collaborating closely with engineering and operations teams. You will be responsible for implementing and maintaining monitoring and logging solutions while producing clear documentation to support the cloud environment. Continuous learning and improving performance based on set targets will be expected.

Please note, you will be required to be within commutable distance to the Ilkley office for occasional office attendance.

VARIED DAY TO DAY RESPONSIBILITIES

Ensuring system reliability, performance, and scalability through monitoring and automation
Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry
Proactively identifying and resolving performance bottlenecks and infrastructure issues
Automating infrastructure provisioning, configuration management, and deployments
Implementing effective logging, monitoring, and alerting strategies
Managing incident response and post-mortem processes to improve system resilience
Implementing high-availability and fault-tolerant solutions
Working with DevOps engineers to streamline CI/CD pipelines and automate testing
Providing detailed documentation for cloud infrastructure, deployment processes, and best practices
Actively participating in capacity planning and cloud architectural decisions
Continuously improving infrastructure reliability and operational efficiency

WHAT ARE WE LOOKING FOR IN A CANDIDATE?

Experience with SRE principles, such as incident management, error budgets, and service-level objectives (SLOs)
Experience designing and implementing robust observability, monitoring and logging solutions
Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki
Strong experience with distributed tracing and telemetry tools such as OpenTelemetry
An understanding of cloud networking architecture and load balancing techniques
Experience with container orchestration platforms like Kubernetes
Proficiency in infrastructure as code (IaC) tools such as Terraform or Ansible
Strong experience in cloud platforms, particularly Azure
Ability to comply with technical and security best practices
Good written and verbal communication skills, with a strong standard of English
Desire to continuously learn and stay updated with technology advancements

Advantages

Several years’ experience in an SRE, DevOps, or similar role
Knowledge of application performance monitoring solutions like DataDog or NewRelic
Hands-on experience with DevOps practices, including CI/CD pipelines and automated deployments
Understanding of software development, ideally with PHP
Understanding of microservices architecture and distributed systems
Working knowledge of GitOps workflows and Kubernetes Operators
Strong automation and scripting abilities with Python, Bash, or Go
Experience managing cloud-native applications in production environments
Proficiency in capacity planning and performance optimization
Experience in managing and improving CI/CD pipelines
Knowledge of incident response best practices and on-call operations

WHAT IS LIFE LIKE AT SMARTSEARCH?

We are a multi-award winning Tech company with an aspirational mentality. Some of our most recent recognitions include: named in the renowned RegTech100 list for 2024, listed in the Top 100 Fastest Growing Tech Companies by Northern Tech Awards 2024 as well as being named Technology Provider of the Year by Corporate Finance Awards 2024. We have been Great Place To Work Certified since 2022. There are excellent progression opportunities due to our growth and you will have personal development goals, regular feedback and support. We are a diverse and inclusive team committed to promoting Diversity & Inclusion and Social Responsibility. Through our DE&I group, charitable initiatives and support for local schools, we actively foster a positive impact on our community.

COMPANY BENEFITS

Our comprehensive benefit package includes:

25 days holiday rising to 30 with each year of service
Private Medical Insurance covering dental and optical
Company pension scheme
Life Assurance – 4x your annual salary
1 day paid volunteering per year
Enhanced maternity / paternity offerings
Employee Assistance Programme
Cycle to work scheme
On site gym

Site Reliability Engineer employer: SmartSearch

SmartSearch is an exceptional employer, renowned for its collaborative and supportive culture that fosters both personal and professional growth. Located in Ilkley, the company offers a range of benefits including generous holiday allowances, private medical insurance, and a commitment to diversity and inclusion, making it a truly rewarding place to work. With a focus on continuous learning and development, employees are empowered to make a meaningful impact while enjoying a vibrant work environment.

Contact Detail:

SmartSearch Recruiting Team

View SmartSearch Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer

✨Tip Number 1

Familiarise yourself with the specific tools mentioned in the job description, such as Grafana, Prometheus, and OpenTelemetry. Having hands-on experience or projects showcasing your skills with these tools can set you apart during discussions.

✨Tip Number 2

Demonstrate your understanding of SRE principles by preparing examples of how you've managed incidents or improved system reliability in previous roles. This will show that you not only understand the theory but have practical experience.

✨Tip Number 3

Engage with the community around Site Reliability Engineering. Join forums, attend meetups, or participate in online discussions to stay updated on best practices and trends. This can also provide you with valuable networking opportunities.

✨Tip Number 4

Prepare to discuss your experience with cloud platforms, particularly Azure, and be ready to explain how you've implemented high-availability solutions. Tailoring your conversation to align with SmartSearch's needs will demonstrate your fit for the role.

We think you need these skills to ace Site Reliability Engineer

Site Reliability Engineering (SRE) principles

Incident management

Error budgets

Service-level objectives (SLOs)

Observability and monitoring tools (Grafana, Prometheus, Loki)

Distributed tracing and telemetry (OpenTelemetry)

Cloud networking architecture

Load balancing techniques

Container orchestration (Kubernetes)

Infrastructure as Code (IaC) tools (Terraform, Ansible)

Cloud platforms (particularly Azure)

Technical and security best practices compliance

Written and verbal communication skills

Continuous learning mindset

Application performance monitoring solutions (DataDog, NewRelic)

DevOps practices (CI/CD pipelines, automated deployments)

Software development understanding (ideally PHP)

Microservices architecture and distributed systems knowledge

GitOps workflows and Kubernetes Operators

Automation and scripting abilities (Python, Bash, Go)

Managing cloud-native applications in production environments

Capacity planning and performance optimization

Improving CI/CD pipelines

Incident response best practices

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights relevant experience and skills that align with the Site Reliability Engineer role. Focus on your proficiency with tools like Grafana, Prometheus, and your experience with cloud platforms, particularly Azure.

Craft a Compelling Cover Letter: In your cover letter, express your enthusiasm for SmartSearch and their mission. Mention specific projects or experiences that demonstrate your ability to ensure system reliability and performance, as well as your commitment to continuous learning.

Showcase Your Technical Skills: When detailing your technical skills, be specific about your experience with incident management, automation, and infrastructure as code tools like Terraform or Ansible. Use examples to illustrate how you've applied these skills in previous roles.

Highlight Soft Skills: Don't forget to mention your communication skills and ability to work collaboratively with engineering and operations teams. Provide examples of how you've successfully worked in a team environment to solve complex problems.

How to prepare for a job interview at SmartSearch

✨Showcase Your SRE Knowledge

Make sure to highlight your understanding of Site Reliability Engineering principles, such as incident management and service-level objectives (SLOs). Be prepared to discuss how you've applied these concepts in previous roles.

✨Demonstrate Technical Proficiency

Familiarise yourself with the tools mentioned in the job description, like Grafana, Prometheus, and OpenTelemetry. Be ready to provide examples of how you've used these tools to enhance system observability and performance.

✨Prepare for Scenario-Based Questions

Expect questions that assess your problem-solving skills in real-world scenarios. Think about past experiences where you identified and resolved performance bottlenecks or managed incident responses, and be ready to share those stories.

✨Emphasise Continuous Learning

SmartSearch values continuous improvement, so express your desire to learn and stay updated with technology advancements. Share any recent courses, certifications, or projects that demonstrate your commitment to professional growth.

Site Reliability Engineer

Ilkley

Full-Time

42000 - 84000 £ / year (est.)

Apply now

Application deadline: 2027-05-27
SmartSearch

View SmartSearch Profile

Similar positions in other companies

UK’s top job board for Gen Z

Discover now