Data & Reporting SRE

Data & Reporting SRE

London Full-Time 43200 - 72000 £ / year (est.) No home office possible
Go Premium
Z

At a Glance

  • Tasks: Own the reliability and performance of data pipelines using AWS, Flink, Kafka, and Python.
  • Company: Join ZILO, a tech innovator transforming the global Transfer Agency sector with flexible solutions.
  • Benefits: Enjoy 38 days leave, private healthcare, flexible working, and a company pension.
  • Why this job: Shape the future of technology while collaborating with a passionate team in a dynamic environment.
  • Qualifications: Experience in data processing, incident management, and proficiency in AWS, Flink, Kafka, and Python required.
  • Other info: Opportunity for global mobility and access to training and development.

The predicted salary is between 43200 - 72000 £ per year.

About:

Step forward into the future of technology with ZILO.

We’re here to redefine what’s possible in technology. While we’re trusted by the global Transfer Agency sector, our technology is truly flexible and designed to transform any business at scale. We’ve created a unified platform that adapts to diverse needs, offering the scalability and reliability legacy systems simply can’t match.

At ZILO, our DNA is built on Character, Creativity, and Craftsmanship. We face every challenge with integrity, explore new ideas with a curious mind, and set a high standard in every detail.

We are a team of dedicated professionals where everyone, regardless of their role, drives our progress and creates real impact. If you’re ready to shape the future, let’s talk.

We are seeking an experienced Site Reliability Engineer (SRE) with deep subject-matter expertise in data processing and reporting. In this role, you will own the reliability, performance, and operational excellence of our real-time and batch data pipelines built on AWS, Apache Flink, Kafka, and Python. You’ll act as the first line of defense for data-related incidents, rapidly diagnose root causes, and implement resilient solutions that keep critical reporting systems up and running.

Incident Management & Triage

  • Serve as on-call escalation for data pipeline incidents, including real-time stream failures and batch job errors.
  • Rapidly analyze logs, metrics, and trace data to pinpoint failure points across AWS, Flink, Kafka, and Python layers.
  • Lead post-incident reviews: identify root causes, document findings, and drive corrective actions to closure.

Reliability & Monitoring

  • Design, implement, and maintain robust observability for data pipelines: dashboards, alerts, distributed tracing.
  • Define SLOs/SLIs for data freshness, throughput, and error rates; continuously monitor and optimize.
  • Automate capacity planning, scaling policies, and disaster-recovery drills for stream and batch environments.

Architecture & Automation

  • Collaborate with data engineering and product teams to architect scalable, fault-tolerant pipelines using AWS services (e.g., Step Functions, EMR, Lambda, Redshift) integrated with Apache Flink and Kafka.
  • Troubleshoot & Maintain Python-based applications.
  • Harden CI/CD for data jobs: implement automated testing of data schemas, versioned Flink jobs, and migration scripts.

Performance Optimization

  • Profile and tune streaming jobs: optimize checkpoint intervals, state backends, and parallelism settings in Flink.
  • Analyze Kafka cluster health: tune broker configurations, partition strategies, and retention policies to meet SLAs.
  • Leverage Python profiling and vectorized libraries to streamline batch analytics and report generation.

Collaboration & Knowledge Sharing

  • Act as SME for data & reporting stack: mentor peers, lead brown-bag sessions on best practices.
  • Contribute to runbooks, design docs, and on-call playbooks detailing common failure modes and recovery steps.
  • Work cross-functionally with DevOps, Security, and Product teams to align reliability goals and incident response workflows.
  • Enhanced leave – 38 days inclusive of 8 UK Public Holidays
  • Private Health Care including family cover
  • Life Assurance – 5x salary
  • Flexible working-work from home and/or in our London Office
  • Employee Assistance Program
  • Company Pension(Salary Sacrifice options available)
  • Access to training and development
  • Buy and Sell holiday scheme
  • The opportunity for “work from anywhere/global mobility”

#J-18808-Ljbffr

Data & Reporting SRE employer: ZILO Technology, Ltd.

At ZILO, we pride ourselves on fostering a dynamic and inclusive work culture that champions innovation and collaboration. As a Data & Reporting SRE, you will enjoy enhanced leave benefits, private healthcare, and flexible working arrangements, all while being part of a team that values your contributions and supports your professional growth through continuous training and development opportunities. Join us in our London office or work from anywhere, and be a part of a company that is not just redefining technology but also prioritising the well-being and advancement of its employees.
Z

Contact Detail:

ZILO Technology, Ltd. Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Data & Reporting SRE

Tip Number 1

Familiarise yourself with the specific technologies mentioned in the job description, such as AWS, Apache Flink, Kafka, and Python. Having hands-on experience or projects that showcase your skills in these areas will make you stand out during discussions.

Tip Number 2

Prepare to discuss real-world scenarios where you've successfully managed data pipeline incidents. Be ready to explain your thought process in diagnosing issues and implementing solutions, as this will demonstrate your problem-solving abilities.

Tip Number 3

Showcase your ability to collaborate across teams by preparing examples of how you've worked with data engineering, DevOps, or product teams in the past. Highlighting your teamwork skills will align well with ZILO's emphasis on collaboration.

Tip Number 4

Research ZILO's company culture and values, particularly their focus on character, creativity, and craftsmanship. Be prepared to discuss how your personal values align with theirs, as cultural fit is often just as important as technical skills.

We think you need these skills to ace Data & Reporting SRE

AWS Services (e.g., Step Functions, EMR, Lambda, Redshift)
Apache Flink
Kafka
Python Programming
Incident Management
Log Analysis
Performance Tuning
Data Pipeline Architecture
Observability and Monitoring
CI/CD Automation
Capacity Planning
Disaster Recovery
Root Cause Analysis
Collaboration Skills
Mentoring and Knowledge Sharing

Some tips for your application 🫡

Understand the Role: Before applying, make sure you fully understand the responsibilities and requirements of the Data & Reporting SRE position. Familiarise yourself with the technologies mentioned, such as AWS, Apache Flink, Kafka, and Python.

Tailor Your CV: Customise your CV to highlight relevant experience in data processing, incident management, and performance optimisation. Use specific examples that demonstrate your expertise in the technologies and practices mentioned in the job description.

Craft a Compelling Cover Letter: Write a cover letter that showcases your passion for technology and your alignment with ZILO's values of Character, Creativity, and Craftsmanship. Mention how your skills can contribute to their mission of redefining technology.

Proofread Your Application: Before submitting, carefully proofread your application materials for any spelling or grammatical errors. A polished application reflects your attention to detail, which is crucial for the role.

How to prepare for a job interview at ZILO Technology, Ltd.

Understand the Tech Stack

Make sure you have a solid grasp of the technologies mentioned in the job description, such as AWS, Apache Flink, Kafka, and Python. Be prepared to discuss your experience with these tools and how you've used them in past projects.

Showcase Incident Management Skills

Be ready to share specific examples of how you've handled data pipeline incidents in the past. Discuss your approach to diagnosing issues, leading post-incident reviews, and implementing solutions to prevent future occurrences.

Demonstrate Collaboration Experience

Highlight your ability to work cross-functionally with different teams, such as DevOps and Product. Share instances where you've collaborated on projects or shared knowledge, as this role emphasises teamwork and communication.

Prepare for Technical Questions

Expect technical questions related to performance optimisation and reliability monitoring. Brush up on concepts like SLOs/SLIs, capacity planning, and CI/CD processes, and be ready to explain how you've applied these in your previous roles.

Data & Reporting SRE
ZILO Technology, Ltd.
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

Z
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>