At a Glance
- Tasks: Design and maintain scalable, reliable systems in a global AWS environment.
- Company: Join Airalo, the world's first eSIM store revolutionising telecom for travellers.
- Benefits: Enjoy health insurance, remote work perks, and an all-expenses-paid company retreat.
- Why this job: Make a real impact in a diverse team while enhancing global connectivity.
- Qualifications: 5+ years in Site Reliability Engineering with strong AWS and Kubernetes skills.
- Other info: Flexible hours, a blameless culture, and opportunities for continuous learning.
The predicted salary is between 48000 - 72000 £ per year.
About Airalo
Alo! Airalo is the world’s first eSIM store that helps people connect in over 200+ countries and regions across the globe. We are building the next digital service that revolutionises the telecom industry. We are a travel-tech company and an equal-opportunity environment that values and executes diversity, inclusion, and equity. Our team is spread across 50+ countries and six continents. What glues us together is our commitment to changing the way you connect.
About you
We hope that you care deeply about the quality of your work, the intrinsic worth of tasks, and the success of your team. You are self-disciplined and do not require micromanagement in terms of your skillset and work ethic. You do your best to flourish as an individual every day while working hard to foster a collaborative team environment. You believe in the importance of being — and staying — authentic, honest, positive, and kind. You are a good interlocutor with clear and concise communication. You are able to manage multiple projects, have an analytical mind, pay keen attention to detail, and love to get your hands dirty. You are cognizant, tolerant, and welcoming of vulnerabilities and cultural differences.
About the Role
- Position: Full-time / Employee
- Location: Remote-first
- Benefits: Health Insurance, work-from-anywhere stipend, annual wellness & learning credits, annual all-expenses-paid company retreat in a gorgeous destination & other benefits
On-Call
Participating in our on-call rotation is a core expectation of this role. It’s essential for maintaining 24/7 service reliability across our global operations, ensuring our systems remain resilient and our customers experience uninterrupted service, regardless of time zone or geography.
- Paid Rotation: We offer standby fees + overtime pay.
- Delayed Start: No on-call duties for your first 6 months.
- Rest & Recovery: Guaranteed rest periods and flexible hours following night incidents.
- Shared Load: Rotations are split (Weekdays vs. Weekends) to minimise fatigue.
We are looking for a Senior Site Reliability Engineer to join our growing engineering team. We are a company that values SRE principles and practices. We believe in empowering our SREs to make data-driven decisions, automate operational tasks, and continuously improve the reliability of our systems. We foster a blameless culture where everyone is encouraged to learn from mistakes and share knowledge. If you are passionate about building and maintaining highly reliable systems, we would love to hear from you!
What you’ll do:
- Lead the design of scalable, fault-tolerant and self-healing systems in a multi-region AWS environment.
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to drive architectural decisions and error budget policies.
- Conduct blameless post-incident reviews to uncover systemic root causes and implement long-term preventive measures.
- Identify patterns of manual work and lead the development of internal tools/automation to permanently eliminate them.
- Develop and maintain automated runbooks and playbooks for common operational tasks and complex incident response.
- Shift from simple monitoring to deep observability, ensuring high cardinality data leads to proactive actionable insights.
- Proactively identify and mitigate operational risks through chaos engineering and architecture reviews.
- Work with software engineers to design systems for reliability, scalability, and maintainability from the early stages of the SDLC.
- Continuously evaluate and optimise system performance, capacity, and cost efficiency.
- Beyond just participating, you will refine the on-call experience to reduce alert fatigue, improve MTTR, and ensure sustainable rotation health.
Must Haves:
- Bachelor’s degree in Computer Engineering or a similar discipline.
- 5+ years of experience as a Site Reliability Engineer or in a similar role.
- 3+ years of experience with AWS services including strong knowledge of container orchestration.
- 2+ years of Kubernetes experience.
- Deep understanding of observability principles and tools like Prometheus, Datadog, OpenTelemetry.
- Experience with leading incident management and complex postmortem analysis.
- Experience and interest in managing infrastructure as code (Terraform).
- Experience with chaos engineering and other techniques for testing system resilience.
- Experience with CI/CD tools such as GitHub Actions for automated delivery.
- Proficiency in at least one programming language (Python, Go, Java, etc.) for building automation and internal tooling.
- Event-driven architecture experience (SNS, SQS, etc).
- Ability to work independently and collaboratively in a fast-paced environment.
- Team player and open to new ideas.
- Good communication skills and fluency in English.
Good to have:
- Prior experience with Scrum and other agile methods.
- Certification in relevant areas such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator (CKA), or similar.
- Prior experience with Telco Core Networks (e.g., 5G/LTE Packet Core, IMS, Signalling) and low-latency networking.
- Experience with AI-driven SRE tools for anomaly detection and improvements.
- Contributions to open-source SRE projects or communities.
- Prior work experience in telecommunications.
- Deep understanding of eSIM and GSMA related technologies and services.
If you are interested in this position, please apply via the link. By applying, you acknowledge and agree that, in case of successful application, Airalo may request to run background checks as a condition for entering into an agreement with you. Rest assured that these checks will only occur upon your prior consent and at the end of the selection process, and will be strictly limited to what is allowed under the laws that are applicable to you. All data that you share or that we collect in connection with such checks will be processed in accordance with our Privacy Policy.
We sincerely thank all applicants in advance for submitting their interest in this opportunity. Airalo is an equal-opportunity employer and values diversity, equity & inclusion. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We are committed to providing reasonable accommodations upon request for individuals with disabilities throughout our job interview process.
Senior Site Reliability Engineer employer: Antler
Contact Detail:
Antler Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Senior Site Reliability Engineer
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, especially those already at Airalo. A friendly chat can give you insider info and maybe even a referral. Don’t be shy; we love connecting with passionate people!
✨Tip Number 2
Show off your skills in real-time! Consider setting up a demo or a mini-project that showcases your expertise in SRE principles. This hands-on approach can really impress us and demonstrate your problem-solving abilities.
✨Tip Number 3
Prepare for the interview by diving deep into our company culture and values. We’re all about collaboration and authenticity, so think of examples from your past that highlight these traits. It’s all about being yourself!
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team at Airalo!
We think you need these skills to ace Senior Site Reliability Engineer
Some tips for your application 🫡
Show Your Passion: When writing your application, let your enthusiasm for the role shine through! We want to see that you genuinely care about building reliable systems and are excited about the opportunity to join our team.
Tailor Your CV: Make sure to customise your CV to highlight relevant experience that aligns with the job description. We love seeing how your skills in AWS, Kubernetes, and incident management can contribute to our mission at Airalo.
Be Clear and Concise: Keep your application straightforward and to the point. We appreciate clear communication, so make sure to articulate your thoughts well without unnecessary fluff. This will help us understand your qualifications better!
Apply Through Our Website: Don’t forget to submit your application via our website! It’s the best way for us to receive your details and ensures you’re considered for the role. We can’t wait to hear from you!
How to prepare for a job interview at Antler
✨Know Your Stuff
Make sure you brush up on your technical skills, especially around AWS, Kubernetes, and observability tools like Prometheus. Be ready to discuss your past experiences in detail, particularly any projects where you’ve designed scalable systems or led incident management.
✨Show Your Team Spirit
Airalo values collaboration, so be prepared to share examples of how you've worked effectively in teams. Highlight instances where you’ve fostered a positive team environment or helped others learn from mistakes, as this aligns with their blameless culture.
✨Ask Smart Questions
Prepare thoughtful questions about Airalo’s approach to SRE principles and practices. This shows your genuine interest in the role and helps you understand how you can contribute to their mission of revolutionising the telecom industry.
✨Be Authentic
Airalo appreciates authenticity, so don’t hesitate to let your personality shine through. Share your passion for building reliable systems and how you stay positive and kind in challenging situations. This will resonate well with their company culture.