Senior Site Reliability Engineer
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Sheffield Full-Time 48000 - 72000 Β£ / year (est.) Home office (partial)
P

At a Glance

  • Tasks: Join our SRE team to ensure reliable and efficient cloud infrastructure for innovative products.
  • Company: Pendo is a cutting-edge tech company focused on enhancing product experiences through reliable infrastructure.
  • Benefits: Enjoy flexible work options, competitive salary, and opportunities for professional growth.
  • Why this job: Be part of a dynamic team that values collaboration, innovation, and making a real impact.
  • Qualifications: Bachelor's degree in Computer Science and 5+ years of relevant experience required.
  • Other info: Participate in a 24x7 on-call rotation and work with advanced technologies like GKE.

The predicted salary is between 48000 - 72000 Β£ per year.

The Site Reliability Engineering (SRE) team at Pendo is responsible for provisioning and maintaining cloud infrastructure from development through production for all product initiatives, and working with developers and product managers to ensure that our products are not only reliable and performant, but also cost-efficient. Our platform is built on Google Kubernetes Engine (GKE) and utilizes several other Google technologies such as Memorystore, Cloud Datastore, PubSub, Cloud Functions, BigQuery, and Vertex AI, as well as services from other vendors such as Amazon SES.

In the development process, SREs provide developers with stable and performant CI and release pipelines and development environments to facilitate frequent delivery of new product features. In production, SREs perform Tier 1 on-call and incident management functions, supporting a high-throughput platform which processes more than 15 billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand service level objectives, think through failures scenarios, and design systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that cloud infrastructure is properly secured, and that sufficient controls are in place to meet our compliance goals with respect to industry standards such as SOC 2.

Role Responsibilities

  • Write high-quality infrastructure-as-code that automates the provisioning, deployment, scaling, and monitoring of Pendo’s infrastructure to ensure that it is reliable and performant.
  • Write maintainable code for product functionality with a primary emphasis on operations, scale, resiliency, and monitoring.
  • Work with other engineers to ensure that new services are well-designed, properly monitored and have well-defined SLIs and achievable SLOs.
  • Debug production issues, learn to mitigate them quickly, and find ways to prevent them.
  • Maintain runbooks for manual tasks and replace those runbooks with automation whenever possible.
  • Proactively track our capacity, quotas, and other performance limits to plan for growth.
  • Participate in a 24x7 on-call rotation to handle product availability issues as well as urgent customer support escalations.

Minimum Qualifications

  • Bachelors Degree in Computer Science or related technical field.
  • Minimum of five (5) years of professional technical experience.
  • Experience working with cloud infrastructure using tools such as Ansible or Terraform.
  • Strong programming skills in a language such as Go or Python, and a willingness to learn new languages as needed.
  • Ability to think and talk about systems in terms of possible failure modes, bottlenecks, etc.
  • Good number sense for discussing performance analysis, cost analysis, and operational metrics.

Preferred Qualifications

  • Minimum of five (5) years experience as a Site Reliability Engineer, or DevOps Engineer.
  • Experience designing, analyzing, and troubleshooting distributed systems.
  • Experience maintaining Kubernetes clusters in a production environment.
P

Contact Detail:

Pendo.io Recruiting Team

StudySmarter Expert Advice 🀫

We think this is how you could land Senior Site Reliability Engineer

✨Tip Number 1

Familiarise yourself with Google Cloud technologies, especially Google Kubernetes Engine (GKE), as this is a key component of the role. Consider taking online courses or certifications to deepen your understanding and demonstrate your commitment to mastering these tools.

✨Tip Number 2

Engage with the SRE community through forums, meetups, or social media platforms. Networking with professionals in the field can provide insights into best practices and may even lead to referrals for job openings.

✨Tip Number 3

Showcase your experience with infrastructure-as-code tools like Ansible or Terraform by contributing to open-source projects or creating your own projects. This hands-on experience will not only enhance your skills but also serve as tangible evidence of your capabilities.

✨Tip Number 4

Prepare for technical interviews by practising system design questions and incident management scenarios. Being able to articulate your thought process around failure modes and performance analysis will set you apart from other candidates.

We think you need these skills to ace Senior Site Reliability Engineer

Cloud Infrastructure Management
Kubernetes Administration
Infrastructure as Code (IaC)
Ansible
Terraform
Programming in Go
Programming in Python
CI/CD Pipeline Development
Incident Management
Performance Monitoring
Capacity Planning
Debugging Skills
Collaboration with Development Teams
Understanding of Service Level Objectives (SLOs)
Knowledge of Security Compliance Standards (e.g., SOC 2)

Some tips for your application 🫑

Tailor Your CV: Make sure your CV highlights relevant experience in Site Reliability Engineering, cloud infrastructure, and programming languages like Go or Python. Use specific examples that demonstrate your skills in automation, monitoring, and incident management.

Craft a Compelling Cover Letter: In your cover letter, express your passion for reliability engineering and how your background aligns with the responsibilities outlined in the job description. Mention your experience with tools like Ansible or Terraform and your approach to ensuring system reliability.

Showcase Relevant Projects: If you have worked on projects involving Kubernetes, cloud infrastructure, or automation, be sure to include these in your application. Describe your role, the challenges faced, and the outcomes achieved to demonstrate your hands-on experience.

Highlight Problem-Solving Skills: Emphasise your ability to debug production issues and your experience with performance analysis. Provide examples of how you've mitigated incidents or improved system reliability in previous roles to showcase your problem-solving capabilities.

How to prepare for a job interview at Pendo.io

✨Showcase Your Technical Skills

Be prepared to discuss your experience with cloud infrastructure tools like Ansible or Terraform. Highlight specific projects where you've implemented these technologies, and be ready to explain the challenges you faced and how you overcame them.

✨Demonstrate Problem-Solving Abilities

Expect questions that assess your ability to think through failure scenarios and bottlenecks. Prepare examples from your past work where you successfully debugged production issues and implemented solutions to prevent future occurrences.

✨Understand the Importance of SLIs and SLOs

Familiarise yourself with service level indicators (SLIs) and service level objectives (SLOs). Be ready to discuss how you have designed systems with these metrics in mind, ensuring reliability and performance while balancing costs.

✨Prepare for On-Call Scenarios

Since the role involves a 24x7 on-call rotation, be ready to talk about your experience with incident management. Share how you handle high-pressure situations and ensure product availability, as well as any strategies you use to manage stress during critical incidents.

Senior Site Reliability Engineer
Pendo.io

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

P
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>