Site Reliability Engineer (SRE)

Job Board

Companies

Monstro

Site Reliability Engineer (SRE)

Full-Time 60000 - 80000 € / year (est.) No home office possible

Apply Now

At a Glance

Tasks: Ensure reliability and observability of our secure platform on Google Cloud.
Company: Join Monstro, a leader in financial intelligence and AI governance.
Benefits: Competitive salary, flexible work environment, and opportunities for professional growth.
Other info: Be part of a diverse team shaping the future of financial technology.
Why this job: Make a real impact on financial outcomes while solving meaningful problems.
Qualifications: Experience with GCP, incident management, and strong coding skills required.

The predicted salary is between 60000 - 80000 € per year.

Monstro is the operating system for governed financial intelligence. We build governance and intelligence infrastructure that enables artificial intelligence to operate safely, explainably, and at institutional scale. We exist because the level of financial guidance historically available to a small group should be accessible to many more people. By combining AI with deep institutional infrastructure, we help financial institutions deliver more personalized, responsible, and life-changing financial support to millions of individuals. We’re building mission-critical systems in a highly regulated domain, and we care deeply about doing it right. If you’re motivated by meaningful problems, high standards, and shaping infrastructure that improves financial outcomes, you’ll feel at home here.

About the Role

Monstro is building a secure, multi-tenant platform on Google Cloud, and we’re hiring a Site Reliability Engineer to own the reliability and observability of that platform end-to-end. This is a hands-on role for someone who wants to do real SRE work - not a rebrand of L1 support. You’ll write the dashboards, define the SLOs, build the automation that kills toil, and take your turn on the on-call rotation that proves it all works. When something breaks at 2 AM, you’re the person who keeps it running; when nothing’s breaking, you’re the person making sure the next break is smaller, shorter, or doesn’t happen at all.

What You’ll Do

Observability and reliability engineering

Define and maintain SLOs and SLIs for our tier-1 services: API gateway, application services, identity, and edge availability
Build canonical dashboards and alerts in Google Cloud Monitoring, backed by structured logs and BigQuery log analytics
Tune alert routing so every page is actionable — kill the rest
Instrument services for distributed tracing and structured logging; push back on services that ship without it
Own error budgets and use them to prioritize reliability work over feature work when burned
Reduce toil: automate the top recurring page from the previous quarter
Maintain runbooks so every page maps to one within a cycle of first occurrence

On-call rotation and incident response

First responder for production alerts across monitoring, API gateway, edge defense, and CI
Triage severity, run the incident bridge, drive mitigation (revision rollback, traffic shift, scaling, edge block, credential rotation)
Own internal and external incident comms during your shift
Drive postmortems to closure with action items tracked as audit evidence
Clean written handoffs at end of shift

Our stack

Google Cloud Platform across multiple environments
Apigee X for API management
Identity Platform for customer identity
Cloud Armor, Cloud IDS, Security Command Center for edge and posture
BigQuery-backed log analytics from an org-level log sink
OpenTofu / Terraform for everything; GitHub Actions for CI/CD
Linear for work tracking

What You Bring

Required:

Solid production experience on GCP (or comparable AWS/Azure depth with willingness to ramp on GCP fast)
Comfortable on-call: you’ve run incidents, written postmortems, and shipped the action items
Strong observability fundamentals: SLOs, log-based metrics, alert hygiene, dashboard discipline
Working knowledge of Kubernetes, API gateways, identity systems, and at least one IaC tool
Scripting / coding fluency (Python, Go, Bash) for automation and tooling
Good written communication — handoffs, postmortems, and runbooks are part of the job
Bias toward fixing the system, not the symptom

Nice to Have:

Apigee or another enterprise API gateway in production
BigQuery for log analytics or audit
Experience standing up observability from scratch, not just maintaining inherited dashboards
SOC2 or similar compliance environments

Why Join Us

You’ll be at the centre of how we bring Monstro to life for our institutional clients. Your work directly shapes the success of every implementation—getting requirements right means we deliver faster, smoother, and with fewer surprises. You’ll be joining at a foundational moment, helping to build the delivery practice from the ground up alongside a Delivery Manager who will rely on you as a critical partner from day one. If you enjoy the puzzle of understanding complex environments, the satisfaction of a well-organised document, and the energy of working directly with clients, this is your role.

We are an equal opportunity employer and value diversity. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Site Reliability Engineer (SRE) employer: Monstro

At Monstro, we pride ourselves on being an exceptional employer that values innovation and collaboration in the financial technology sector. Our work culture fosters a commitment to high standards and meaningful problem-solving, providing employees with opportunities for professional growth while working on mission-critical systems. Located in a dynamic environment, we offer competitive benefits and a chance to make a real impact in delivering responsible financial intelligence to a broader audience.

Contact Detail:

Monstro Recruiting Team

View Monstro Profile

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineer (SRE)

✨Tip Number 1

Network like a pro! Reach out to current or former employees at Monstro on LinkedIn. A friendly chat can give you insider info and maybe even a referral, which can really boost your chances.

✨Tip Number 2

Show off your skills in real-time! If you get the chance, participate in coding challenges or hackathons related to SRE. It’s a great way to demonstrate your problem-solving abilities and passion for the field.

✨Tip Number 3

Prepare for the interview by diving deep into Monstro’s tech stack. Familiarise yourself with Google Cloud, Kubernetes, and observability tools. The more you know, the more confident you'll feel when discussing how you can contribute.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining the team at Monstro.

We think you need these skills to ace Site Reliability Engineer (SRE)

Google Cloud Platform (GCP)

Incident Management

SLOs and SLIs Definition

Observability Engineering

API Management (Apigee X)

Kubernetes

Infrastructure as Code (IaC) Tools

Scripting (Python, Go, Bash)

Log Analytics (BigQuery)

Automation

Written Communication

Postmortem Analysis

Alert Management

Distributed Tracing

Security Best Practices

Some tips for your application 🫡

Tailor Your Application:Make sure to customise your CV and cover letter to highlight your experience with GCP, SLOs, and incident management. We want to see how your skills align with the role of Site Reliability Engineer at Monstro!

Show Off Your Communication Skills:Since good written communication is key for this role, don’t shy away from showcasing your ability to write clear handoffs, postmortems, and runbooks. We love seeing candidates who can articulate their thoughts well!

Be Specific About Your Experience:When detailing your past roles, be specific about your hands-on experience with observability tools and automation. We’re looking for real-world examples that demonstrate your problem-solving skills in high-pressure situations.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on joining our team!

How to prepare for a job interview at Monstro

✨Know Your Stack

Familiarise yourself with the technologies mentioned in the job description, especially Google Cloud Platform, Kubernetes, and API gateways. Be ready to discuss your hands-on experience with these tools and how you've used them to solve real-world problems.

✨Demonstrate Your Incident Response Skills

Prepare to share specific examples of incidents you've managed. Highlight your role in triaging alerts, running incident bridges, and driving postmortems. This will show that you not only understand the theory but have practical experience in handling on-call situations.

✨Showcase Your Automation Mindset

Discuss your experience with scripting and automation, particularly in reducing toil. Bring examples of how you've automated recurring issues or improved observability through dashboards and alerts. This aligns perfectly with the role's focus on building reliable systems.

✨Communicate Clearly

Since good written communication is crucial for this role, practice articulating your thoughts clearly. Prepare to explain complex technical concepts simply, as well as how you document handoffs and postmortems. This will demonstrate your ability to collaborate effectively within a team.

Site Reliability Engineer (SRE)

Monstro

Apply Now

Site Reliability Engineer (SRE)

At a Glance

Site Reliability Engineer (SRE) employer: Monstro

StudySmarter Expert Advice🤫

We think you need these skills to ace Site Reliability Engineer (SRE)

Some tips for your application 🫡

How to prepare for a job interview at Monstro

Company

Product

Help