Machine Learning Systems Engineer in London

Machine Learning Systems Engineer in London

London Full-Time 70000 - 90000 £ / year (est.) Home office (partial)
T

At a Glance

  • Tasks: Architect and build scalable backend systems for a cutting-edge media intelligence platform.
  • Company: Join a forward-thinking tech company focused on AI and media innovation.
  • Benefits: Enjoy competitive pay, flexible work options, and opportunities for professional growth.
  • Other info: Collaborative environment with mentorship opportunities and career advancement.
  • Why this job: Be at the forefront of AI technology and make a real impact in media processing.
  • Qualifications: 5+ years in backend engineering with strong AI/ML integration experience.

The predicted salary is between 70000 - 90000 £ per year.

We are developing a highly scalable media intelligence platform that processes, analyzes, and structures large volumes of multimedia content across text, image, video, and audio. As a Senior Applied ML Engineer, you will architect and build the core backend systems that power media ingestion, processing workflows, metadata generation, AI-based analysis, semantic search, and retrieval across large media libraries.

We are looking for a Senior Applied ML Engineer who can design, implement, optimize, and evaluate a production-grade moderation pipeline using open-source models. This role requires deep backend engineering expertise, strong system design capability, and practical experience integrating AI/ML systems into production workflows. You will work on complex media-processing pipelines, video/audio analysis, OCR, speech-to-text, embedding generation, vector search, multimodal model integrations, and high-throughput asynchronous workloads. You will collaborate closely with engineering leadership to define backend architecture, improve reliability and scalability, and guide other engineers in delivering secure, observable, and high-performance systems.

Responsibilities

  • Backend Architecture & System Ownership: Architect, build, and operate scalable backend services for a media intelligence platform, focusing on clean, maintainable, and production-ready systems. Own critical backend components end to end, from system design and API contracts through implementation, deployment, monitoring, and iteration. Drive architectural decisions across APIs, processing pipelines, distributed compute, storage, search, observability, cloud infrastructure, and model-serving workflows. Design data models and storage patterns for media assets, generated metadata, embeddings, processing jobs, model outputs, search indexes, and audit trails. Design high-throughput media ingestion and processing pipelines for large volumes of video, audio, image, and text content. Build distributed, event-driven workflows for media processing using queues and pub/sub systems such as SQS, Kafka, Pub/Sub, or equivalent technologies. Implement reliable asynchronous processing patterns, including retries, idempotency, dead-letter queues, backpressure handling, and fault-tolerant job execution.
  • AI/ML Integration & Model Workflows: Lead the development and optimization of metadata extraction, content analysis, scene detection, transcription, embedding generation, and multimodal AI inference workflows. Integrate and optimize AI/ML services within backend workflows, including model APIs, embedding pipelines, OCR, speech-to-text, scene analysis, multimodal inference, batching, caching, and fallback strategies. Collaborate with ML engineers, data scientists, or external model providers to benchmark models, compare quality/latency trade-offs, and safely roll out model upgrades.
  • Model Serving & Performance Optimization: Optimize AI/ML inference workflows for latency, throughput, reliability, and cost across both real-time and batch-processing paths. Work with model-serving systems such as vLLM, Triton, TGI, SageMaker, Vertex AI, or custom inference services to improve batching, concurrency, warmup behavior, timeout handling, autoscaling, and GPU utilization. Evaluate and apply practical model optimization techniques such as quantization, model distillation, batching, caching, prompt optimization, and routing to smaller or cheaper models where appropriate. Design and maintain vector search and indexing systems using technologies such as Pinecone, Weaviate, Qdrant, Elastic Vectors, FAISS, pgvector, or similar tools. Build retrieval workflows that support semantic search, similarity matching, duplicate detection, media discovery, and structured metadata search. Monitor model and system performance in production, including API latency, queue depth, processing time, model error rates, GPU utilization, confidence distributions, drift signals, and cost per processed item.
  • Infrastructure, Reliability & Observability: Deploy and operate systems on AWS, GCP, Azure, or equivalent cloud platforms, including compute, storage, networking, queues, model-serving infrastructure, and monitoring systems. Ensure system reliability through logging, metrics, tracing, alerting, dashboards, operational runbooks, and incident-response best practices.
  • Collaboration & Engineering Leadership: Collaborate with product, design, data, and ML teams to deliver media-rich, AI-powered product features. Mentor junior and mid-level engineers, support technical planning, review designs, and raise engineering quality across the team. Participate in code reviews, documentation, technical planning, and continuous improvement of engineering practices. Ensure code quality through testing, peer review, clear documentation, and maintainable implementation patterns.

Education & Experience

  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
  • 5–7+ years of backend engineering experience, ideally building scalable distributed systems, media platforms, data pipelines, or high-throughput backend services.
  • Prior experience owning major backend modules end to end, including architecture, implementation, deployment, monitoring, and production operations.
  • 3+ years of experience integrating AI/ML inference systems into backend workflows, including model APIs, embedding pipelines, OCR, speech-to-text, scene detection, or multimodal model outputs.
  • Hands-on experience creating AI-powered processing pipelines for image, video, audio, or text analysis.
  • Practical experience with production model optimization, especially for image, video, embedding, or multimodal models, including batching, caching, quantization, prompt optimization, routing strategies, latency reduction, and cost optimization.
  • Prior experience with vector search, semantic search, media retrieval, or similarity-matching systems is strongly preferred.
  • Experience mentoring engineers, leading technical discussions, and influencing architectural decisions across backend, infrastructure, and AI/ML workflows.

Technical Skills

  • Strong expertise in Python and/or Node.js with deep understanding of building scalable RESTful APIs and backend architectures.
  • Experience with HuggingFace transformers ecosystem and deep learning frameworks such as PyTorch and TensorFlow.
  • Strong experience with SQL/NoSQL databases, schema design, and data modeling.
  • Preferred exposure to distributed systems, microservices, asynchronous processing, and event-driven patterns with SQS, Pub/Sub, Kafka, or other queueing/pub-sub systems.
  • Experience deploying production systems on AWS, GCP, or similar cloud platforms.
  • Knowledge of infrastructure patterns (compute, storage, networking, observability).

AI/ML Integration

  • Experience orchestrating embedding generation, scene detection, OCR, speech-to-text, image classification, video analysis, and multimodal model integrations.
  • Experience optimizing inference workflows for latency, throughput, reliability, and cost.
  • Experience working with scalable and optimized inference settings, including tuning sampling parameters, managing output-length formats, and configuring reasoning-related behaviours.
  • Familiarity with practical model optimization techniques such as batching, caching, quantization, model distillation, prompt optimization, fallback routing, and use of smaller models where appropriate.
  • Experience working with model-serving systems such as vLLM, Triton, TGI, SageMaker, Vertex AI, or custom inference services is preferred.
  • Experience working with LLM and multi-modal evaluation and benchmarking frameworks and domain-specific benchmarks with the ability to interpret results and optimize model performance accordingly.

System Design & Architecture

  • Preferred understanding of distributed systems, scaling patterns, and performance engineering.
  • Ability to design modular, maintainable, and efficient architectures.
  • Experience with API versioning, modularization, and designing long-running workflows.
  • Understanding of performance bottlenecks and low-latency backend patterns.

Machine Learning Systems Engineer in London employer: Tether

Join a forward-thinking company that champions innovation and collaboration in the heart of the tech industry. As a Senior Applied ML Engineer, you will thrive in a dynamic work culture that prioritises employee growth through mentorship and continuous learning opportunities. With a focus on cutting-edge technology and a commitment to building scalable systems, this role offers the chance to make a significant impact while enjoying a supportive environment that values your contributions.

T

Contact Details:

Tether Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Machine Learning Systems Engineer in London

Join Local Tech Meetups

Get out there and mingle with fellow developers by joining local tech meetups. It’s a fantastic way to meet people who might be working at Tether or know someone who does. Plus, you can pick up some trendy tech skills and trends while you're at it!

Contribute to Open Source Projects

Show off your coding chops by jumping into open-source projects. Not only does this give you practical experience, but it also gets you noticed in the dev community. You'll create a killer portfolio that speaks volumes about your skills to Tether.

Tap into Online Developer Communities

Don’t underestimate the power of online developer communities like GitHub, Stack Overflow, and even Reddit. Participate in discussions, share your projects, and build your visibility. We can often find opportunities through these channels that can lead to a full-time gig at companies like Tether.

Explore Job Boards Specifically for Tech Roles

Keep your eyes peeled on job boards that focus on tech roles. Sites like TechCareers or Stack Overflow Jobs can often have listings for companies like Tether that might not show up on broader job sites. Make it a habit to check these regularly, and don’t hesitate to apply directly through our website!

We think you need these skills to ace Machine Learning Systems Engineer in London

Backend Engineering
System Design
AI/ML Integration
Media Processing Pipelines
Python
Node.js
RESTful APIs

Some tips for your application 🫡

Show off your coding skills:When applying for a software engineering role, it's super important to showcase your coding skills. Make sure your CV includes your tech stack, any relevant programming languages you’re comfortable with, and examples of projects you've worked on. If you have a GitHub profile, link it up! We love to see code in action.

Tailor your portfolio:For a full-time role, we’d expect to see some solid examples of your work in your portfolio. Make sure to include at least two or three projects that highlight your problem-solving skills and your ability to work with different technologies. Focus on the projects that are most relevant to the position at Tether.

Craft a killer cover letter:Your cover letter is your chance to stand out—make it personal! Explain why you want to work at Tether and how your skills align with the role. Show us your passion for software development. We dig enthusiastic candidates who understand the value of collaboration and continuous learning!

Be clear and concise:When it comes to writing your CV and cover letter, clarity is key. Avoid jargon that could confuse us and stick to simple, direct language. Highlight your achievements with quantifiable results where possible, and keep everything easy to read. A well-organised application goes a long way!

How to prepare for a job interview at Tether

Brush Up on Your Coding Skills

For a full-time software engineering role, it's crucial that we stay sharp with our coding abilities. Expect technical questions that might involve solving problems on the spot or discussing algorithms. Practise on platforms like LeetCode or HackerRank to get comfortable with the types of questions that often come up.

Know Your Tools and Frameworks

Make sure we’re well-acquainted with the tools and technologies listed in the job description. Familiarise ourselves with any specific frameworks or programming languages mentioned. If Tether uses React or Node.js, for instance, be ready to discuss how we’ve used them in previous projects or coursework.

Showcase Your Projects

Bring along a portfolio that highlights our best work. This could be code samples, GitHub repositories, or any side projects we’ve built. Make sure we can talk through our thought process for each project, especially the challenges we faced and how we solved them—this shows our problem-solving skills in action.

Prepare for Behavioural Questions

While technical skills are key, full-time positions also require cultural fit. Be ready to discuss our previous experiences and how we handle teamwork, conflict, and deadlines. Brush up on the STAR method—Situation, Task, Action, Result—to clearly articulate our past experiences when discussing how we've contributed to a team.