Site Reliability Engineer
Site Reliability Engineer

Site Reliability Engineer

Full-Time 72000 - 108000 £ / year (est.) Home office (partial)
Winston Fox

At a Glance

  • Tasks: Design and develop automation solutions using Python for scalable distributed systems.
  • Company: Award-winning London Hedge Fund with a fast-paced, low-bureaucracy culture.
  • Benefits: Up to £150,000 salary, generous bonuses, healthcare, gym, and 30 days holiday.
  • Why this job: Shape the future of intelligent automation and make a real impact in tech.
  • Qualifications: Strong Python skills, experience with distributed systems, and automation frameworks.
  • Other info: Collaborative environment with world-class technology and excellent career growth opportunities.

The predicted salary is between 72000 - 108000 £ per year.

We are looking for a highly skilled Engineer with expertise in Python programming, automation, and modern observability practices to help build and operate scalable distributed systems for an award-winning London Hedge Fund. This role sits at the intersection of platform engineering, AI tooling, and system reliability.

You will design automation frameworks, develop AI-assisted engineering tools, and implement observability solutions that provide deep insights into complex distributed architectures.

Responsibilities
  • Design, develop, and maintain robust automation solutions using Python.
  • Build and maintain observability pipelines including metrics, logs, and traces across distributed systems.
  • Develop internal AI-powered tools that enhance engineering productivity and operational intelligence.
  • Implement monitoring, alerting, and diagnostics to improve system reliability, performance, and scalability.
  • Integrate observability platforms with automation workflows and incident response systems.
  • Collaborate with platform, infrastructure, data and development teams to improve system visibility and operational maturity.
  • Design tooling that enables proactive detection, analysis, and remediation of system issues across distributed environments.
  • Contribute to architecture decisions around telemetry, AI-assisted debugging, and automation frameworks.
  • Support business users and stakeholders (direct) with system analysis, problem management, and technical resolution.
Skills & Experience
  • Strong professional experience with Python development in production environments.
  • Proven experience building automation frameworks, scripts, and developer tooling.
  • Strong experience working with distributed systems and large-scale service architectures.
  • Hands-on experience working with Kubernetes in production environments.
  • Deep understanding of observability practices, including metrics, logs, tracing, and telemetry pipelines.
  • Experience integrating AI or machine learning tooling into engineering workflows.
  • Strong understanding of APIs, microservices, and containerised environments.
  • Experience with CI/CD pipelines and infrastructure automation.
  • Ability to design scalable, maintainable engineering tools.
  • Experience in supporting business users directly, project or problem coordination with dev and infra teams, project ownership experience.
Interesting Technologies
  • Observability: OpenTelemetry, Prometheus, Grafana, Elastic Stack (ELK), Jaeger
  • Automation & CI/CD: GitHub Actions, Jenkins, GitLab CI, Argo Workflows
  • Distributed Systems & Messaging: Kafka, Redis, gRPC
Offer
  • World-class technology environment (award-winning) with best-in-class engineering teams.
  • Fast-paced and low-bureaucracy culture - get stuff done mindset.
  • Up to £150,000 base salary.
  • 50%-100% annual cash bonus.
  • Pension, Healthcare, Gym, Food, 30 days holiday etc.
  • 4 days onsite, 1 day wfh.
  • The chance to shape the future of intelligent automation and operational insight in distributed platforms.

Site Reliability Engineer employer: Winston Fox

Join an award-winning London Hedge Fund as a Site Reliability Engineer, where you will thrive in a fast-paced, low-bureaucracy culture that prioritises innovation and efficiency. With a competitive salary of up to £150,000 and generous benefits including a substantial annual cash bonus, healthcare, and 30 days of holiday, this role offers exceptional opportunities for professional growth and the chance to work with cutting-edge technologies in a collaborative environment. Embrace the opportunity to shape the future of intelligent automation and operational insight while enjoying a balanced work-life with 4 days onsite and 1 day working from home.
Winston Fox

Contact Detail:

Winston Fox Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.

✨Tip Number 2

Show off your skills! Create a portfolio showcasing your Python projects, automation frameworks, and any AI tools you've developed. This gives potential employers a taste of what you can do and sets you apart from the crowd.

✨Tip Number 3

Prepare for interviews by brushing up on your knowledge of observability practices and distributed systems. Be ready to discuss how you've tackled challenges in past roles and how you can contribute to their team.

✨Tip Number 4

Don't forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive about their job search!

We think you need these skills to ace Site Reliability Engineer

Python Programming
Automation Frameworks
Observability Practices
Distributed Systems
Kubernetes
AI Tooling
Telemetry Pipelines
APIs
Microservices
Containerised Environments
CI/CD Pipelines
Infrastructure Automation
Problem Management
Technical Resolution

Some tips for your application 🫡

Show Off Your Python Skills: Make sure to highlight your experience with Python programming in your application. We want to see how you've used it in production environments, especially for building automation frameworks and tools.

Talk About Your Automation Experience: We love a good automation story! Share specific examples of how you've designed and implemented automation solutions. This will show us that you can help streamline our processes and improve system reliability.

Demonstrate Your Observability Knowledge: Since observability is key for this role, mention any experience you have with metrics, logs, and tracing. Let us know how you've integrated observability practices into your previous projects.

Apply Through Our Website: Don't forget to submit your application through our website! It’s the best way for us to keep track of your application and ensure it gets the attention it deserves.

How to prepare for a job interview at Winston Fox

✨Know Your Python Inside Out

Make sure you brush up on your Python skills before the interview. Be ready to discuss your past projects and how you've used Python to build automation frameworks or tools. They’ll likely want to see your problem-solving approach, so think of examples where you’ve tackled challenges using Python.

✨Familiarise Yourself with Observability Tools

Since the role involves observability practices, get comfortable with tools like Prometheus, Grafana, and OpenTelemetry. Be prepared to explain how you’ve implemented these in previous roles and how they can enhance system reliability. Showing that you understand the importance of metrics, logs, and traces will definitely impress them.

✨Showcase Your Experience with Distributed Systems

This position is all about working with distributed systems, so be ready to discuss your experience in this area. Talk about specific architectures you’ve worked with, any challenges you faced, and how you overcame them. Highlighting your hands-on experience with Kubernetes will also be a big plus!

✨Prepare for Collaboration Questions

Collaboration is key in this role, so expect questions about how you work with different teams. Think of examples where you’ve successfully collaborated with platform, infrastructure, or data teams. Emphasise your communication skills and how you’ve contributed to improving operational maturity in past projects.

Site Reliability Engineer
Winston Fox

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>