At a Glance
- Tasks: Design and maintain scalable systems while collaborating with engineering teams.
- Company: Wakapi is a forward-thinking tech company focused on enhancing developer experience.
- Benefits: Enjoy flexible working options, competitive salary, and opportunities for professional growth.
- Why this job: Join a dynamic team to make a real impact in platform engineering and DevOps.
- Qualifications: Experience with Terraform, AWS, and monitoring tools like New Relic is essential.
- Other info: Ideal for those passionate about system reliability and innovative engineering practices.
The predicted salary is between 43200 - 72000 £ per year.
We are seeking a highly skilled Senior Site Reliability Engineer to join our Platform Engineering team. The ideal candidate will have a strong understanding of DevOps and Service Level Management (SLM) metrics, with experience in event-driven infrastructure projects using tools like Terraform, New Relic, Kubernetes, AWS, and Kafka. As a Platform Engineering representative, you will collaborate with engineering teams to ensure our platform infrastructure tooling meets their needs and positively impacts Developer Experience. You will also assist in setting appropriate thresholds for alerts and automations related to their applications.
Responsibilities:
- Design, implement, and maintain scalable and highly available systems using load balancing, auto-scaling, canary releases, and blue-green deployments.
- Develop and maintain monitoring and logging dashboards with tools like New Relic, Prometheus, Grafana, and Datadog, ensuring observability through metrics, tracing, log aggregation, and alerting.
- Help teams determine settings and thresholds for alerts and automations based on application performance requirements.
- Monitor, optimize, and ensure system reliability and performance using tools like New Relic and applying DORA metrics.
- Track uptime, response times, and resolution times to ensure compliance with SLAs, SLOs, and SLIs.
- Implement and promote system resiliency practices, including Chaos Engineering.
- Collaborate with cross-functional teams to enhance platform engineering practices and gather metrics data.
Requirements:
- Proven experience with Infrastructure-as-Code tools like Terraform.
- Strong understanding of scalability, high availability patterns, and DevOps metrics such as DORA.
- Knowledge of SLM metrics (SLAs, SLOs, SLIs) and their application.
- Experience with monitoring and observability tools like New Relic, Prometheus, Grafana, and Datadog.
- Experience working with Kafka and improving performance in event-driven, real-time data architectures.
- Familiarity with cloud providers like AWS, Azure, or GCP.
- Experience with CI/CD tools such as GitHub Actions, Jenkins, or GitLab CI.
- Strong analytical and communication skills.
Nice-to-haves:
- Familiarity with Observability-as-Code tooling and practices.
- Knowledge of Chaos Engineering practices.
SR Site Reliability Engineer employer: Wakapi
Contact Detail:
Wakapi Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land SR Site Reliability Engineer
✨Tip Number 1
Familiarise yourself with the specific tools mentioned in the job description, such as Terraform, New Relic, and Kubernetes. Having hands-on experience or projects showcasing your skills with these tools can set you apart from other candidates.
✨Tip Number 2
Understand the principles of DevOps and Service Level Management (SLM) metrics thoroughly. Be prepared to discuss how you've applied these concepts in previous roles, especially in relation to SLAs, SLOs, and SLIs.
✨Tip Number 3
Showcase your experience with event-driven architectures and tools like Kafka. If you have examples of optimising performance in real-time data systems, be ready to share those during discussions.
✨Tip Number 4
Highlight your collaborative skills, as this role involves working closely with cross-functional teams. Prepare examples of how you've successfully collaborated on projects to enhance platform engineering practices.
We think you need these skills to ace SR Site Reliability Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights relevant experience in DevOps, Infrastructure-as-Code tools like Terraform, and monitoring tools such as New Relic and Grafana. Use specific examples to demonstrate your skills in event-driven infrastructure projects.
Craft a Compelling Cover Letter: In your cover letter, express your enthusiasm for the role at Wakapi and explain how your background aligns with their needs. Mention your experience with SLM metrics and your ability to enhance Developer Experience through collaboration.
Showcase Relevant Projects: If you have worked on projects involving load balancing, auto-scaling, or Chaos Engineering, be sure to include these in your application. Highlight your contributions and the impact they had on system reliability and performance.
Proofread Your Application: Before submitting, carefully proofread your CV and cover letter for any errors or typos. A polished application reflects your attention to detail and professionalism, which are crucial for a Senior Site Reliability Engineer.
How to prepare for a job interview at Wakapi
✨Showcase Your Technical Skills
Be prepared to discuss your experience with Infrastructure-as-Code tools like Terraform and your understanding of DevOps metrics. Highlight specific projects where you've implemented scalable systems or improved performance using tools like New Relic or Grafana.
✨Demonstrate Collaboration Experience
Since the role involves working with cross-functional teams, share examples of how you've successfully collaborated with engineering teams in the past. Discuss how you gathered metrics data and enhanced platform engineering practices together.
✨Understand Service Level Management Metrics
Familiarise yourself with SLAs, SLOs, and SLIs, and be ready to explain how you've applied these metrics in previous roles. This will show your potential employer that you can effectively monitor and optimise system reliability.
✨Prepare for Scenario-Based Questions
Expect questions that assess your problem-solving skills, especially related to system resiliency and Chaos Engineering. Think of scenarios where you've had to implement solutions under pressure and be ready to discuss your thought process.