Platform / Site Reliability Engineer (UK) in London

Platform / Site Reliability Engineer (UK) in London

London Full-Time 60000 - 80000 £ / year (est.) Home office (partial)
T

At a Glance

  • Tasks: Build and maintain scalable data platforms and ML infrastructure for real-time applications.
  • Company: Join TWG Global, a leader in AI-driven innovation across various industries.
  • Benefits: Competitive salary, performance bonuses, and comprehensive medical benefits.
  • Other info: Remote role available for UK-based candidates with excellent career growth opportunities.
  • Why this job: Work on impactful AI projects with top-tier professionals in a flat, trusting environment.
  • Qualifications: 3-6 years in DevOps or SRE, proficient in Docker, Kubernetes, and Python.

The predicted salary is between 60000 - 80000 £ per year.

At TWG Group Holdings, LLC (“TWG Global”), we drive innovation and business transformation across a range of industries—including financial services, insurance, technology, media, and sports—by leveraging data and AI as core assets. Our AI-first, cloud-native approach delivers real-time intelligence and interactive business applications, empowering informed decision-making for both customers and employees.

We prioritize responsible data and AI practices to ensure ethical standards and regulatory compliance. Our decentralized structure enables each business unit to operate autonomously, supported by a central AI Solutions Group, while strategic partnerships with leading data and AI vendors fuel game-changing efforts in marketing, operations, and product development. You will collaborate with management to advance our data and analytics transformation, enhance productivity, and enable agile, data-driven decisions. By leveraging relationships with top tech startups and universities, you will help create competitive advantages and drive enterprise innovation.

At TWG Global, your contributions will support our goal of sustained growth and superior returns, as we deliver rare value and impact across our businesses. We’re a fast-growing AI/ML team delivering high-impact use case solutions to financial institutions, insurers, and other regulated enterprises. Backed by proven leaders in finance and national security, our team is scaling rapidly to serve clients across North America with robust, secure, and production-grade AI solutions.

Role Overview

We are seeking a Platform / Site Reliability Engineer (SRE) to ensure the scalability, stability, and performance of our data platforms and ML infrastructure. You’ll work closely with data scientists, ML engineers, and platform vendors to deploy and monitor production systems, automate workflows, and reduce operational overhead.

What you'll do:

  • Build and maintain infrastructure to support real-time and batch ML workloads
  • Implement observability tools (logging, monitoring, alerting) for model performance and system uptime
  • Design and manage CI/CD pipelines applications
  • Ensure high availability, disaster recovery, and rollback capabilities for production environments
  • Manage access controls, secrets, and security policies in collaboration with compliance and IT
  • Troubleshoot incidents, lead postmortems, and drive root-cause resolution
  • Work with U.S. and international teams to provide 24/7 coverage across time zones

Requirements

  • 3–6 years of experience in DevOps, SRE, or backend engineering roles
  • Proficient with tools like Docker, Kubernetes, Terraform, GitLab/GitHub Actions, Airflow
  • Strong scripting in Python or Bash and familiarity with Linux environments
  • Knowledge of observability stacks (e.g., Prometheus, Grafana, ELK, Datadog)
  • Familiarity with cloud platforms (e.g., AWS, GCP, or Azure)
  • Strong documentation, problem-solving, and incident response skills

Preferred Qualifications:

  • Experience supporting ML/AI workflows using Palantir Foundry is a plus (but not required)
  • Exposure to compliance frameworks like SOC 2, ISO 27001, or financial regulations
  • Knowledge of MLOps frameworks (e.g., MLflow, Kubeflow, SageMaker Pipelines)
  • Ability to automate deployments, testing, and monitoring at scale

Benefits

  • Work on real-world AI applications with high-impact clients
  • Collaborate with world-class data scientists, engineers, and product leaders
  • Flat org structure, high trust, high autonomy
  • Competitive salary + performance-based incentives

Position Location

This is a remote position, but candidates must be currently based in the UK.

Compensation

The target salary for this position is £94,500. A bonus will be included in the compensation package, in addition to the full range of medical, financial, and other benefits.

Platform / Site Reliability Engineer (UK) in London employer: TWG Global AI

At TWG Global, we pride ourselves on fostering a dynamic and innovative work environment where your contributions directly impact the future of AI and data-driven solutions. Our flat organisational structure promotes high trust and autonomy, allowing you to collaborate with top-tier professionals while enjoying competitive salaries and performance-based incentives. As a remote role based in the UK, you will have the flexibility to work from anywhere while being part of a fast-growing team dedicated to delivering high-impact solutions for prestigious clients across various industries.

T

Contact Details:

TWG Global AI Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Platform / Site Reliability Engineer (UK) in London

Join the IT Consultancy Buzz

Get involved in local or virtual IT consultancy meetups and forums. This is where we can rub shoulders with industry professionals, get insights into what TWG Global AI values, and even spot unadvertised opportunities. Don't miss out on these chances to make a name for ourselves in the IT world!

Show Off Your Skills

Create a personal project or case study relevant to the challenges TWG Global AI might face. Use platforms like GitHub or Medium to share your findings. This not only demonstrates our consulting skills but shows a proactive attitude, making us stand out from the crowd when applying for that full-time gig.

Leverage LinkedIn for Connections

Follow and engage with the relevant thought leaders and influencers in IT consultancy on LinkedIn. Share insightful content and join discussions to gain visibility. A well-placed comment or shared article could catch the attention of someone at TWG Global AI!

Direct Apply to TWG Global AI

Let's not forget to apply directly through the TWG Global AI website! Tailor your application to showcase our understanding of their consulting style and how we can contribute to their projects. A personalised approach can make a huge difference in landing that full-time position!

We think you need these skills to ace Platform / Site Reliability Engineer (UK) in London

DevOps
Site Reliability Engineering (SRE)
Infrastructure Management
Real-time and Batch ML Workloads
Observability Tools (logging, monitoring, alerting)
CI/CD Pipelines
High Availability and Disaster Recovery

Some tips for your application 🫡

Showcase Your Problem-Solving Skills:In IT consulting, it's all about problem-solving, so make sure your CV highlights your analytical skills and any relevant projects you've tackled. Mention specific technologies or methodologies you've used to resolve issues or improve processes; this shows you can think critically and deliver results, which is vital for us at TWG Global AI.

Highlight Relevant Certifications:Certifications like ITIL, PMP, or even specific tech stack qualifications can really make you stand out. Make sure to include these in your CV, as they not only demonstrate your expertise but also your commitment to staying current in the field. We love seeing candidates who are proactive about their professional development!

Tailor Your Cover Letter:Your cover letter is your chance to connect personally with us at TWG Global AI. Share stories about your experiences in IT consulting, and how they shaped your desire to join our team. Mention why you’re excited about this particular role, and how you see yourself contributing to our projects.

Keep It Clear and Concise:We're all busy, so make sure your application is easy to read. Use bullet points for key achievements, and don’t overload us with jargon. A clean, professional layout goes a long way. Remember, the clearer your application, the more likely we are to invite you in for an interview!

How to prepare for a job interview at TWG Global AI

Brush Up on Your Technical Skills

For an IT consulting role, be ready to demonstrate your technical prowess. You might face questions on systems integration, cloud technologies, or even troubleshooting specific software. If you have experience with tools like AWS, Azure, or even specific programming languages, make sure you can talk about them fluently.

Showcase Your Problem-Solving Approach

IT consulting is all about solving problems for clients. Think about how you can illustrate your approach to a past challenge using the STAR method (Situation, Task, Action, Result). It's a great way to show how you tackle complex issues and come up with effective solutions.

Know the Business Impact of IT Solutions

When discussing your experiences, focus not just on the tech solutions you implemented, but also on their business impact. Employers want to see that you can connect IT with organisational goals. Prep examples that highlight how your tech contributions improved efficiency or reduced costs for past clients or projects.

Prepare for Behavioural Questions

Since IT consulting often involves teamwork and client interactions, expect behavioural questions that assess your interpersonal skills. Be prepared with examples that demonstrate your adaptability, communication skills, and how you handle client feedback. Before the interview, think of situations where you worked closely with clients to create effective IT strategies or changes.