SR Site Reliability Engineer
SR Site Reliability Engineer

SR Site Reliability Engineer

Full-Time 48000 - 72000 £ / year (est.) No home office possible
W

At a Glance

  • Tasks: Design and maintain scalable systems while collaborating with engineering teams.
  • Company: Wakapi is a forward-thinking tech company focused on enhancing developer experience.
  • Benefits: Enjoy flexible working options, competitive salary, and opportunities for professional growth.
  • Why this job: Join a dynamic team to impact platform engineering and improve system reliability.
  • Qualifications: Experience with Terraform, DevOps metrics, and monitoring tools is essential.
  • Other info: Ideal for those passionate about event-driven infrastructure and cloud technologies.

The predicted salary is between 48000 - 72000 £ per year.

Join to apply for the SR Site Reliability Engineer role at Wakapi .

We are seeking a highly skilled Senior Site Reliability Engineer to join our Platform Engineering team. The ideal candidate will have a strong understanding of DevOps and Service Level Management (SLM) metrics, with experience in event-driven infrastructure projects using tools like Terraform, New Relic, Kubernetes, AWS, and Kafka.

As a Platform Engineering representative, you will collaborate with engineering teams to ensure our platform infrastructure tooling meets their needs and positively impacts Developer Experience. You will also assist in setting appropriate thresholds for alerts and automations related to their applications.

Responsibilities

  • Design, implement, and maintain scalable and highly available systems using load balancing, auto-scaling, canary releases, and blue-green deployments.
  • Develop and maintain monitoring and logging dashboards with tools like New Relic, Prometheus, Grafana, and Datadog, ensuring observability through metrics, tracing, log aggregation, and alerting.
  • Help teams determine settings and thresholds for alerts and automations based on application performance requirements.
  • Monitor, optimize, and ensure system reliability and performance using tools like New Relic and applying DORA metrics.
  • Track uptime, response times, and resolution times to ensure compliance with SLAs, SLOs, and SLIs.
  • Implement and promote system resiliency practices, including Chaos Engineering.
  • Collaborate with cross-functional teams to enhance platform engineering practices and gather metrics data.

Requirements

  • Proven experience with Infrastructure-as-Code tools like Terraform.
  • Strong understanding of scalability, high availability patterns, and DevOps metrics such as DORA.
  • Knowledge of SLM metrics (SLAs, SLOs, SLIs) and their application.
  • Experience with monitoring and observability tools like New Relic, Prometheus, Grafana, and Datadog.
  • Experience working with Kafka and improving performance in event-driven, real-time data architectures.
  • Familiarity with cloud providers like AWS, Azure, or GCP.
  • Experience with CI/CD tools such as GitHub Actions, Jenkins, or GitLab CI.
  • Strong analytical and communication skills.

Nice-to-haves

  • Familiarity with Observability-as-Code tooling and practices.
  • Knowledge of Chaos Engineering practices.

Senior Level: Mid-Senior, Employment: Full-time, Industry: Software Development

#J-18808-Ljbffr

SR Site Reliability Engineer employer: Wakapi

Wakapi is an exceptional employer that fosters a collaborative and innovative work culture, making it an ideal place for Senior Site Reliability Engineers to thrive. With a strong emphasis on employee growth, we offer opportunities for professional development through hands-on experience with cutting-edge technologies and practices in a supportive environment. Located in a vibrant tech hub, our team enjoys a dynamic atmosphere that encourages creativity and teamwork, ensuring that every contribution positively impacts the Developer Experience.
W

Contact Detail:

Wakapi Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land SR Site Reliability Engineer

✨Tip Number 1

Familiarise yourself with the specific tools mentioned in the job description, such as Terraform, New Relic, and Kubernetes. Having hands-on experience or projects showcasing your skills with these tools can set you apart during discussions.

✨Tip Number 2

Understand the principles of DevOps and Service Level Management (SLM) metrics thoroughly. Be prepared to discuss how you've applied these concepts in previous roles, especially in relation to DORA metrics and system reliability.

✨Tip Number 3

Showcase your collaborative skills by preparing examples of how you've worked with cross-functional teams in the past. Highlight any experiences where you improved developer experience or enhanced platform engineering practices.

✨Tip Number 4

Stay updated on the latest trends in observability and Chaos Engineering. Being able to discuss recent developments or best practices in these areas can demonstrate your commitment to continuous learning and improvement.

We think you need these skills to ace SR Site Reliability Engineer

DevOps
Service Level Management (SLM)
Infrastructure-as-Code (IaC)
Terraform
Kubernetes
AWS
Kafka
New Relic
Prometheus
Grafana
Datadog
Monitoring and Observability
Load Balancing
Auto-scaling
Canary Releases
Blue-Green Deployments
DORA Metrics
SLAs, SLOs, SLIs
CI/CD Tools (GitHub Actions, Jenkins, GitLab CI)
Analytical Skills
Communication Skills
System Resiliency Practices
Chaos Engineering

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights relevant experience in DevOps, Infrastructure-as-Code tools like Terraform, and monitoring tools such as New Relic and Grafana. Use specific examples to demonstrate your skills in event-driven infrastructure projects.

Craft a Compelling Cover Letter: In your cover letter, express your enthusiasm for the role and the company. Mention how your background aligns with the responsibilities listed, particularly in areas like system reliability, performance optimisation, and collaboration with engineering teams.

Showcase Relevant Projects: If you have worked on projects involving scalability, high availability, or Chaos Engineering, be sure to include these in your application. Describe your role and the impact of your contributions on the project's success.

Highlight Soft Skills: Don't forget to mention your strong analytical and communication skills. These are crucial for collaborating with cross-functional teams and ensuring that platform infrastructure meets developer needs.

How to prepare for a job interview at Wakapi

✨Showcase Your Technical Skills

Be prepared to discuss your experience with Infrastructure-as-Code tools like Terraform and your understanding of DevOps metrics. Highlight specific projects where you've implemented scalable systems or improved performance using tools like New Relic or Grafana.

✨Demonstrate Problem-Solving Abilities

Expect scenario-based questions that assess your ability to troubleshoot and optimise system reliability. Share examples of how you've handled incidents in the past, focusing on your analytical approach and the outcomes of your actions.

✨Understand the Company’s Needs

Research Wakapi and their platform engineering practices. Be ready to discuss how your skills can enhance their Developer Experience and contribute to their goals, especially in relation to SLM metrics and event-driven architectures.

✨Prepare Questions for the Interviewers

Have insightful questions ready about the team dynamics, current challenges they face, and their approach to Chaos Engineering. This shows your genuine interest in the role and helps you gauge if the company is the right fit for you.

SR Site Reliability Engineer
Wakapi
W
  • SR Site Reliability Engineer

    Full-Time
    48000 - 72000 £ / year (est.)

    Application deadline: 2027-07-14

  • W

    Wakapi

Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>