At a Glance
- Tasks: Join our SRE team to ensure reliable and efficient cloud infrastructure for innovative products.
- Company: Pendo is a cutting-edge tech company focused on enhancing product experiences through reliable infrastructure.
- Benefits: Enjoy flexible work options, competitive salary, and opportunities for professional growth.
- Why this job: Be part of a dynamic team that values collaboration, innovation, and making a real impact.
- Qualifications: Bachelor's degree in Computer Science and 5+ years of relevant experience required.
- Other info: Participate in a 24x7 on-call rotation and work with advanced technologies like GKE.
The predicted salary is between 48000 - 72000 Β£ per year.
The Site Reliability Engineering (SRE) team at Pendo is responsible for provisioning and maintaining cloud infrastructure from development through production for all product initiatives, and working with developers and product managers to ensure that our products are not only reliable and performant, but also cost-efficient. Our platform is built on Google Kubernetes Engine (GKE) and utilizes several other Google technologies such as Memorystore, Cloud Datastore, PubSub, Cloud Functions, BigQuery, and Vertex AI, as well as services from other vendors such as Amazon SES.
In the development process, SREs provide developers with stable and performant CI and release pipelines and development environments to facilitate frequent delivery of new product features. In production, SREs perform Tier 1 on-call and incident management functions, supporting a high-throughput platform which processes more than 15 billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand service level objectives, think through failures scenarios, and design systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that cloud infrastructure is properly secured, and that sufficient controls are in place to meet our compliance goals with respect to industry standards such as SOC 2.
Role Responsibilities
- Write high-quality infrastructure-as-code that automates the provisioning, deployment, scaling, and monitoring of Pendoβs infrastructure to ensure that it is reliable and performant.
- Write maintainable code for product functionality with a primary emphasis on operations, scale, resiliency, and monitoring.
- Work with other engineers to ensure that new services are well-designed, properly monitored and have well-defined SLIs and achievable SLOs.
- Debug production issues, learn to mitigate them quickly, and find ways to prevent them.
- Maintain runbooks for manual tasks and replace those runbooks with automation whenever possible.
- Proactively track our capacity, quotas, and other performance limits to plan for growth.
- Participate in a 24x7 on-call rotation to handle product availability issues as well as urgent customer support escalations.
Minimum Qualifications
- Bachelors Degree in Computer Science or related technical field.
- Minimum of five (5) years of professional technical experience.
- Experience working with cloud infrastructure using tools such as Ansible or Terraform.
- Strong programming skills in a language such as Go or Python, and a willingness to learn new languages as needed.
- Ability to think and talk about systems in terms of possible failure modes, bottlenecks, etc.
- Good number sense for discussing performance analysis, cost analysis, and operational metrics.
Preferred Qualifications
- Minimum of five (5) years experience as a Site Reliability Engineer, or DevOps Engineer.
- Experience designing, analyzing, and troubleshooting distributed systems.
- Experience maintaining Kubernetes clusters in a production environment.
Contact Detail:
Pendo.io Recruiting Team
StudySmarter Expert Advice π€«
We think this is how you could land Senior Site Reliability Engineer
β¨Tip Number 1
Familiarise yourself with Google Cloud technologies, especially Google Kubernetes Engine (GKE), as this is a key component of the role. Consider taking online courses or certifications to deepen your understanding and demonstrate your commitment to mastering these tools.
β¨Tip Number 2
Engage with the SRE community through forums, meetups, or social media platforms. Networking with professionals in the field can provide insights into best practices and may even lead to referrals for job openings.
β¨Tip Number 3
Showcase your experience with infrastructure-as-code tools like Ansible or Terraform by contributing to open-source projects or creating your own projects. This hands-on experience will not only enhance your skills but also serve as tangible evidence of your capabilities.
β¨Tip Number 4
Prepare for technical interviews by practising system design questions and incident management scenarios. Being able to articulate your thought process around failure modes and performance analysis will set you apart from other candidates.
We think you need these skills to ace Senior Site Reliability Engineer
Some tips for your application π«‘
Tailor Your CV: Make sure your CV highlights relevant experience in Site Reliability Engineering, cloud infrastructure, and programming languages like Go or Python. Use specific examples that demonstrate your skills in automation, monitoring, and incident management.
Craft a Compelling Cover Letter: In your cover letter, express your passion for reliability engineering and how your background aligns with the responsibilities outlined in the job description. Mention your experience with tools like Ansible or Terraform and your approach to ensuring system reliability.
Showcase Relevant Projects: If you have worked on projects involving Kubernetes, cloud infrastructure, or automation, be sure to include these in your application. Describe your role, the challenges faced, and the outcomes achieved to demonstrate your hands-on experience.
Highlight Problem-Solving Skills: Emphasise your ability to debug production issues and your experience with performance analysis. Provide examples of how you've mitigated incidents or improved system reliability in previous roles to showcase your problem-solving capabilities.
How to prepare for a job interview at Pendo.io
β¨Showcase Your Technical Skills
Be prepared to discuss your experience with cloud infrastructure tools like Ansible or Terraform. Highlight specific projects where you've implemented these technologies, and be ready to explain the challenges you faced and how you overcame them.
β¨Demonstrate Problem-Solving Abilities
Expect questions that assess your ability to think through failure scenarios and bottlenecks. Prepare examples from your past work where you successfully debugged production issues and implemented solutions to prevent future occurrences.
β¨Understand the Importance of SLIs and SLOs
Familiarise yourself with service level indicators (SLIs) and service level objectives (SLOs). Be ready to discuss how you have designed systems with these metrics in mind, ensuring reliability and performance while balancing costs.
β¨Prepare for On-Call Scenarios
Since the role involves a 24x7 on-call rotation, be ready to talk about your experience with incident management. Share how you handle high-pressure situations and ensure product availability, as well as any strategies you use to manage stress during critical incidents.