Senior Site Reliability Engineer

Job Board

Companies

Navan

Senior Site Reliability Engineer

Full-Time 80000 - 100000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Design and develop automation and infrastructure services for seamless travel experiences.
Company: Join Navan, a fast-growing company revolutionising travel and expense services.
Benefits: Competitive salary, dynamic work environment, and opportunities for professional growth.
Other info: Collaborative culture with a focus on innovation and mentorship.
Why this job: Make a real impact on travel technology while working with cutting-edge tools and AI.
Qualifications: 5+ years in SRE or DevOps, strong problem-solving skills, and experience with cloud environments.

The predicted salary is between 80000 - 100000 £ per year.

At Navan, we’re passionate about providing a seamless one-stop experience for business travelers, no matter how they travel, where they stay, or where they’re going. We are constantly striving to make the most reliable and scalable systems possible to ensure that our services are available to our travelers when they need it most. With our exponential growth, we have many exciting challenges ahead and we’re looking for a passionate Senior Site Reliability Engineer to join our team in London.

As a Senior SRE you will design and develop tooling, automation and infrastructure services that power the Navan services, used by thousands of travelers on a daily basis. You will work closely with development teams, release and productivity teams and security teams to identify customer needs and build innovative solutions to solve them. You will work across a vast array of systems and technologies, aiming to build an autonomous, monitored, fault-tolerant infrastructure that is optimized for both simplicity and uptime. You will collaborate with the backend and frontend engineering teams to ensure that product solutions are scalable, efficient, and reliable. You will design infrastructure to support our massive growth and work with the team to maintain the highest level of service.

What You'll Do:

Building a fast moving, high growth service. You are comfortable in a startup environment, enjoy seeing the product take shape, and have strong ownership of the success of your services.
Designing, implementing and operating cloud infrastructure. You think in terms of infrastructure as code, deployment pipelines, and building the guardrails to make going fast also going safely.
Identifying reliability anti-patterns and solving them systemically. You dive deep into the data to evaluate the health of your systems, and you use it to improve visibility and reliability across the fleet of services.
Finding and automating the toil out of our processes. You’d prefer to automate it entirely, or build a tool to empower your users rather than be the gatekeeper to the tool.
Leveraging AI tools and platforms in your daily work to achieve autonomous operations, reduce toil, and improve system observability.
Defining and driving the adoption of system reliability standards, including formalizing SLO/SLI frameworks, observability standards, and blameless post-mortem practices across multiple engineering teams.
Driving the adoption of AI-assisted developer tools and platforms to increase engineering productivity, enforce code quality standards, and enable real-time architectural validation.

What We’re Looking For:

5+ years of progressive experience as a Senior SRE or DevOps Lead (or equivalent role).
2+ years of experience in working on a production, 24x7 product environment.
Passionate about solving problems and learning new tools and technologies.
Excellent communication skills working with stakeholders and domain experts across the company to design solutions to user problems.
Thrive in a fast-paced environment.
Demonstrated experience mentoring and leading junior and mid-level engineers, and acting as a technical owner for cross-functional infrastructure projects.
Operate with a strong sense of ownership demonstrated through shipping production-quality code and infrastructure equipped with testing, monitoring and documentation.
Hands-on operational experience with Java based applications and services including JVM profiling and performance tuning (python, Node.js and Go are a plus).
Hands-on experience building and operating distributed systems in a public cloud environment (preferably AWS), using CI/CD to deploy, manage and operate production systems, focusing on tooling and automation using tools such as maven and Jenkins.
Hands-on experience with microservice architecture and related reliability and resiliency patterns such as throttling, queueing, and retries.
Hands-on experience with writing Infrastructure as Code in Terraform or Cloudformation or similar tools.
A passion for automating away everything, using scripting languages such as python, bash groovy.
Built, using, and automating monitoring systems such as NewRelic, DataDog, SignalFX, Kibana.
Hands-on experience deploying, operating, and monitoring production-grade AI/ML microservices (e.g., RAG pipelines, agentic systems) on cloud platforms like AWS Fargate/ECS.
Experience leveraging AI/LLM platforms (e.g., Gemini, Braintrust) and managing their secrets and infrastructure using Infrastructure as Code (Terraform) and AWS SSM.
Demonstrated ability to integrate AI-specific telemetry and advanced observability practices to enable predictive insights and systemic root-cause analysis.

Senior Site Reliability Engineer employer: Navan

At Navan, we pride ourselves on being an exceptional employer that fosters a collaborative and innovative work culture. Our Competitive Intelligence & Strategy team plays a crucial role in driving market growth, and we offer ample opportunities for professional development and continuous learning. Located in a vibrant area, our employees enjoy a supportive environment that encourages creativity and teamwork, making it a rewarding place to advance your career.

Contact Details:

Navan Recruitment Team

View Navan profile

StudySmarter Expert Advice🤫

We think this is how you could land Senior Site Reliability Engineer

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects and contributions. This is a great way to demonstrate your expertise in building reliable systems and automating processes.

✨Tip Number 3

Prepare for interviews by brushing up on your technical knowledge and problem-solving skills. Practice common SRE scenarios and be ready to discuss how you've tackled reliability challenges in the past.

✨Tip Number 4

Apply through our website! We love seeing passionate candidates who align with our mission. Tailor your application to highlight your experience in cloud infrastructure and automation, and let us know why you're excited about joining Navan.

We think you need these skills to ace Senior Site Reliability Engineer

Site Reliability Engineering

Cloud Infrastructure Design

Infrastructure as Code

Automation

Monitoring Systems

Java

Python

Node.js

Distributed Systems

CI/CD

Microservice Architecture

Terraform

AWS

AI/ML Microservices

Some tips for your application 🫡

Tailor Your CV:Make sure your CV reflects the skills and experiences that match the Senior Site Reliability Engineer role. Highlight your experience with cloud infrastructure, automation, and any relevant tools you've used. We want to see how you can contribute to our mission!

Craft a Compelling Cover Letter:Your cover letter is your chance to show us your passion for the role and the company. Share why you're excited about working at Navan and how your background aligns with our goals. Let your personality shine through!

Showcase Your Problem-Solving Skills:In your application, give examples of how you've tackled challenges in previous roles. We love candidates who can dive deep into data and come up with innovative solutions. Don't hold back on sharing your successes!

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows us you're keen on joining our team!

How to prepare for a job interview at Navan

✨Know Your Tech Stack

Make sure you’re well-versed in the technologies mentioned in the job description, especially around cloud infrastructure, microservices, and automation tools. Brush up on your experience with Java, Python, and Terraform, as these will likely come up during technical discussions.

✨Showcase Problem-Solving Skills

Prepare to discuss specific examples where you've identified reliability issues and implemented solutions. Use the STAR method (Situation, Task, Action, Result) to structure your answers, highlighting your analytical skills and how you’ve improved system observability.

✨Demonstrate Collaboration

Since the role involves working closely with various teams, be ready to share experiences where you’ve successfully collaborated with developers, security teams, or other stakeholders. Emphasise your communication skills and how you’ve driven projects forward through teamwork.

✨Embrace the Startup Mindset

Navan is looking for someone who thrives in a fast-paced environment. Be prepared to discuss how you adapt to change, manage multiple priorities, and take ownership of your work. Share examples that reflect your passion for innovation and your ability to work autonomously.

Senior Site Reliability Engineer

Navan

Apply Now

Senior Site Reliability Engineer

At a Glance

Senior Site Reliability Engineer employer: Navan

StudySmarter Expert Advice🤫

We think you need these skills to ace Senior Site Reliability Engineer

Some tips for your application 🫡

How to prepare for a job interview at Navan

Company

Product

Help