Site Reliability Engineer - AI Agents in London

Job Board

Companies

Kraken

Site Reliability Engineer - AI Agents

Site Reliability Engineer - AI Agents in London

London Full-Time 70000 - 90000 £ / year (est.) Home office (partial)

Apply Now

At a Glance

Tasks: Design and operate AI infrastructure, ensuring reliability and scalability for innovative projects.
Company: Join Kraken, a leading crypto platform with a global impact.
Benefits: Competitive salary, flexible work options, and opportunities for professional growth.
Other info: Dynamic team culture that values diversity and innovation.
Why this job: Be at the forefront of AI technology and shape the future of finance.
Qualifications: 5+ years in site reliability or platform engineering, with strong coding skills.

The predicted salary is between 70000 - 90000 £ per year.

Building the Future of Open Finance. Payward - the parent company behind Kraken, NinjaTrader, Breakout, xStocks, Payward Services and CF Benchmarks - has spent the last 15 years building one of the most modern and globally accessible financial infrastructure platforms in the industry, built to advance an open, global financial system.

The team founded in 2011, Kraken is one of the world's longest-standing crypto platforms, trusted by over 10 million individuals and institutions across the globe. It offers spot trading, margin, futures, staking, and OTC services, with products built for both individual investors and institutional clients.

The AI Infrastructure team sits within the Data organization and is responsible for building, operating, and scaling the systems that power AI agents in production — both internal tools and external-facing products. Working closely with the AI and Agent Systems teams, this group ensures that the orchestration, execution, and model-serving layers underpinning agentic workflows are reliable, observable, and built to scale.

This team operates at the intersection of data infrastructure and applied AI — a space that moves fast and demands engineers who can bring production discipline to emerging technology. You'll partner across Data Engineering, ML, and product-facing teams to harden agent infrastructure and keep it running at the standards our users expect.

Importantly, this is a platform engineering team. Beyond operating infrastructure, the team is responsible for building the APIs, SDKs, and platform capabilities that enable AI, Data, and Engineering teams to safely and efficiently consume agent infrastructure as a service. Success in this role requires thinking beyond infrastructure operations and toward developer experience, platform adoption, and long-term scalability.

The opportunity

Design, build, and operate the infrastructure layer supporting AI agent workflows in production.
Ensure reliability, scalability, and observability of agentic systems across internal and external products.
Design and develop platform services, APIs, SDKs, and self-service capabilities that allow engineering teams to easily consume AI infrastructure and agent platform services.
Manage and maintain the compute, orchestration, and serving infrastructure powering model inference and agent execution.
Implement robust monitoring, alerting, and incident response procedures tailored to AI/ML workloads.
Utilize Infrastructure as Code (IaC) tools such as Terraform to provision and manage cloud (AWS) infrastructure components.
Build and maintain CI/CD pipelines that support rapid, reliable deployment of AI services and agent workflows.
Define and implement guardrails, failure handling, and recovery patterns specific to agentic and LLM-powered systems.
Collaborate with AI and Data Engineering teams to translate experimental agent prototypes into hardened production systems.
Manage containerized workloads using Kubernetes, ensuring efficient deployment, scaling, and orchestration of AI services.
Implement access controls and security best practices across AI infrastructure environments.
Document architecture, runbooks, and best practices to support knowledge sharing across the team.

What You Bring

5+ years of experience as a Site Reliability Engineer, Infrastructure Engineer, Platform Engineer, or similar role in a production environment.
Hands-on experience supporting ML infrastructure, model serving, or MLOps workflows in production.
Experience building developer platforms, internal tooling, APIs, or SDKs consumed by engineering teams at scale.
Strong understanding of platform engineering principles, including developer experience, self-service infrastructure, and API-driven platform design.
Proficiency with Infrastructure as Code tools, particularly Terraform.
Experience with containerization and orchestration, particularly Kubernetes and Docker.
Solid understanding of cloud infrastructure, preferably AWS.
Strong scripting skills (bash/shell) and proficiency in at least one programming language (Python preferred).
Experience designing and operating observability, monitoring, and alerting systems.
Experience implementing incident response procedures and participating in on-call rotations.
Strong collaboration skills working across data, AI, and engineering teams.
High ownership mindset in a fast-moving, high-stakes production environment.

Nice to haves

Experience building or operating infrastructure for agent-based or LLM-powered systems.
Familiarity with agent orchestration frameworks (e.g., LangGraph, CrewAI, or similar).
Background in data infrastructure, including familiarity with Airflow, Kafka, Spark, or data lake tooling.
Experience with CI/CD pipelines and deployment automation for AI/ML workloads.
Exposure to evaluation frameworks and model performance monitoring at scale.
Experience working in fast-moving 0→1 environments or platform-building teams.
Experience building SDKs, developer tooling, or internal platform products with a strong focus on usability and adoption.
Experience with Cloudflare's cloud platform and product ecosystem, including networking, security, performance, and Zero Trust solutions.

Unless a specific application deadline is stated in the job posting, applications are accepted on an ongoing basis.

Please note, applicants are permitted to redact or remove information on their resume that identifies age, date of birth, or dates of attendance at or graduation from an educational institution.

We consider qualified applicants with criminal histories for employment on our team, assessing candidates in a manner consistent with the requirements of the San Francisco Fair Chance Ordinance.

Payward is powered by people from around the world and we celebrate the diverse talents, backgrounds, contributions, and unique perspectives that everyone brings to the table. We hire based on merit, seeking out people with the right abilities, knowledge, and skills for the job. We encourage you to apply for roles where you don't fully meet the listed requirements, especially if you're passionate or knowledgeable about crypto.

We may ask candidates to complete job-related skills or work-style assessments as part of our hiring process. These assessments evaluate competencies relevant to the role and are applied consistently across candidates for similar positions. Results are considered alongside experience and interviews, and are not the sole basis for any employment decision.

As an equal opportunity employer, we don't tolerate discrimination or harassment of any kind, whether based on race, ethnicity, age, gender identity, citizenship, religion, sexual orientation, disability, pregnancy, veteran status, or any other protected characteristic as outlined by federal, state, or local laws.

Site Reliability Engineer - AI Agents in London employer: Kraken

At Payward, we pride ourselves on fostering a dynamic and inclusive work culture that empowers our employees to thrive. As a Site Reliability Engineer within our innovative AI Infrastructure team, you'll have the opportunity to work at the forefront of technology in a fast-paced environment, with ample opportunities for professional growth and collaboration across diverse teams. Our commitment to employee development, coupled with our global reach and cutting-edge projects, makes us an exceptional employer for those seeking meaningful and rewarding careers in the financial technology sector.

Contact Details:

Kraken Recruitment Team

View Kraken profile

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineer - AI Agents in London

✨Get Involved in Data Science Meetups

Tap into local data science meetups or workshops to connect with fellow enthusiasts and professionals. These events are goldmines for networking, and sometimes even lead directly to job openings at companies like Kraken!

✨Show Off Your Projects

Start building a public portfolio showcasing your data science projects on platforms like GitHub or personal websites. Highlight unique analyses or models you've developed. This not only demonstrates your skills but also gets your name out there for roles like Site Reliability Engineer - AI Agents at Kraken.

✨Leverage Professional Networks

Join professional bodies related to data science, like the Data Science Society or similar organisations. Getting involved can lead to mentorship opportunities and insider knowledge about full-time positions at companies like Kraken.

✨Apply Directly through Our Website

When you find a suitable opening like Site Reliability Engineer - AI Agents at Kraken, make sure to apply directly through our website. It gives you an edge and shows you're keen to join our team. Plus, who doesn’t love a direct application? It’s easier than navigating through job boards!

We think you need these skills to ace Site Reliability Engineer - AI Agents in London

Site Reliability Engineering

Infrastructure Engineering

Platform Engineering

MLOps

API Development

SDK Development

Infrastructure as Code (IaC)

Terraform

Kubernetes

Docker

AWS Cloud Infrastructure

Scripting (bash/shell)

Python Programming

Observability and Monitoring Systems

Incident Response Procedures

Some tips for your application 🫡

Show Off Your Projects:In the world of data science, your projects can speak volumes about your skills. Make sure to showcase a few key projects in your CV or portfolio, especially those that highlight your ability to work with data sets, build models, or use relevant tools like Python, R, or SQL. Don’t forget to include links to any GitHub repositories if applicable!

Quantify Your Achievements:Employers love numbers! When drafting your CV, highlight your achievements with quantifiable results. For instance, mention how your data analysis led to a certain percentage increase in efficiency or revenue at a previous job or project. These details can really make your application pop!

Craft a Tailored Cover Letter:For a full-time role at Kraken, your cover letter should reflect your passion for data science and your excitement about the specific projects or values of the company. Dive into why you’re a good fit, how your skills align with their needs, and any unique perspectives you can bring to the team.

Stand Out with Relevant Courses and Certifications:Although experience talks, relevant courses or certifications can be your ticket to impressing hiring managers at Kraken. Mention any standout courses you've completed that equipped you with essential skills, such as machine learning certifications or data visualisation courses. This shows your commitment to continuously developing your skills in the field!

How to prepare for a job interview at Kraken

✨Brush Up on Your Statistics

For a data science role, we need to seriously sharpen our statistics skills. Get ready to tackle technical questions on probability distributions, hypothesis testing, and regression analysis. These are often the bread and butter of data science interviews, so don't just skim over them!

✨Showcase Your Projects

Prepare a killer portfolio showcasing your data science projects. We should include details about the datasets used, the tools and techniques applied, and the impact of your findings. If we can walk them through a particularly challenging project or a cool visualisation that had real-world implications, it’ll really make us stand out!

✨Get Comfortable with Python and R

Most data science positions require us to be proficient in programming languages like Python and R. We should practice common libraries like pandas, NumPy, and scikit-learn, and be ready for live coding exercises or algorithm questions. Showing off our coding chops can really impress the interviewers at Kraken!

✨Prepare for Case Studies

Expect to encounter real-world case studies during the interview. We might be asked how we’d approach a data problem or analyse a dataset to extract insights. It's essential to think out loud and demonstrate our problem-solving process so that the interviewer can see our logical thinking in action.

Site Reliability Engineer - AI Agents in London

Kraken

Location: London

Apply Now

Site Reliability Engineer - AI Agents in London

At a Glance

Site Reliability Engineer - AI Agents in London employer: Kraken

StudySmarter Expert Advice🤫

We think you need these skills to ace Site Reliability Engineer - AI Agents in London

Some tips for your application 🫡

How to prepare for a job interview at Kraken

Company

Product

Help