Senior Data & MLOps Engineer in London

Senior Data & MLOps Engineer in London

London Full-Time 80000 - 100000 £ / year (est.) No working from home possible
Dormont Manufacturing Co

At a Glance

  • Tasks: Design and scale infrastructure for AI, building data pipelines and optimising system health.
  • Company: CoreWeave, a pioneering cloud platform for AI with a vibrant culture.
  • Benefits: Competitive salary, comprehensive health insurance, and tuition reimbursement.
  • Other info: Dynamic work environment with opportunities for growth and collaboration.
  • Why this job: Join a fast-growing team and make a real impact in AI technology.
  • Qualifications: 7+ years in data engineering or MLOps, strong Python skills, and experience with distributed systems.

The predicted salary is between 80000 - 100000 £ per year.

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability.

What You’ll Do: The Data Science team is focused on developing an advanced reliability platform. This system covers various aspects of data processing and analysis, including data intake, deriving meaningful metrics, identifying unusual patterns, predicting potential issues, finding slow processes in distributed systems, and using automated analysis to determine causes. We collaborate closely with internal teams like Fleet, Infrastructure, and AI Platform to enhance system stability, optimize resource use, shorten resolution times, and maintain service availability and financial performance.

About the role: As a Senior Data & MLOps Engineer, you will design and scale the infrastructure supporting the GPU Intelligence Platform. This involves building pipelines for handling data, features, model training, and delivering insights and predictions for system health and optimization. You will transition the system from initial prototypes to a production environment operating across the fleet, focusing on scalability, separating real‑time service from periodic processing, and dynamic resource management based on system load and data frequency. You will architect and deploy these scalable distributed services using orchestration technologies.

Key responsibilities:

  • Design and implement scalable data ingestion pipelines.
  • Build feature processing and baseline computation systems.
  • Productionize models for prediction and detection.
  • Develop and operate low‑latency service and robust offline workflows.
  • Architect horizontally scalable services with clear separation between components, leveraging orchestration for distribution.
  • Implement monitoring and feedback loops for continuous model and signal improvement.
  • Collaborate with Platform teams to integrate operational signals into monitoring and diagnostics.
  • Implement a scalable solution for mitigation and structured analysis.

Who You Are:

  • 7+ years of experience in data engineering, distributed systems, MLOps, or infrastructure ML roles in production environments.
  • Proven experience building high-throughput streaming or telemetry pipelines (e.g., Kafka, Pulsar, Kinesis, or equivalent).
  • Strong experience designing time‑series feature pipelines and operating large‑scale observability systems.
  • Experience building and maintaining feature stores and ensuring offline/online feature parity.
  • Hands‑on experience deploying ML models to production, including versioning, monitoring, rollback, and drift detection.
  • Experience designing scalable microservices deployed in Kubernetes‑based environments.
  • Strong proficiency in Python and at least one systems language (Go, Rust, or C++).
  • Experience working with distributed compute or training systems (e.g., NCCL, PyTorch Distributed, Spark, Ray, Slurm).
  • Familiarity with GPU telemetry systems such as NVML or DCGM and hardware‑level monitoring concepts.
  • Demonstrated experience scaling systems from Proof‑of‑Concept to production‑grade, fleet‑level deployments.

Preferred:

  • Experience working on GPU fleet management, hyperscale infrastructure, or AI training clusters.
  • Experience building anomaly detection or failure prediction systems for hardware or distributed systems.
  • Experience implementing distributed straggler detection or collective‑level performance analysis systems.
  • Experience developing agentic or LLM‑powered reasoning systems for diagnostics or operational intelligence.
  • Background in reliability engineering or SRE practices.

Wondering if you’re a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren’t a 100% skill or experience match. Here are a few qualities we’ve found compatible with our team. If some of this describes you, we’d love to talk.

  • You love building systems that turn raw infrastructure telemetry into actionable intelligence.
  • You’re curious about distributed systems failure modes, GPU performance pathologies, and reliability engineering at scale.
  • You’re excited by the idea of moving from anomaly detection to prediction to autonomous root cause reasoning.
  • You enjoy designing platforms that protect uptime, revenue, and customer trust through proactive systems thinking.

Why CoreWeave?

At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper‑growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:

  • Be Curious at Your Core
  • Act Like an Owner
  • Empower Employees
  • Deliver Best‑in‑Class Client Experiences
  • Achieve More Together

We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization’s growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!

What We Offer

In addition to a competitive salary, we offer a variety of benefits to support your needs, including:

  • Family-level Medical Insurance
  • Family-level Dental Insurance
  • Generous Pension Contribution
  • Life Assurance at 4x Salary
  • Critical Illness Cover
  • Employee Assistance Programme
  • Tuition Reimbursement
  • Work culture focused on innovative disruption

Benefits may vary by location.

Our Workplace

While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.

Export Control Compliance

This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.

Equal Opportunity Employer

CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.

Senior Data & MLOps Engineer in London employer: Dormont Manufacturing Co

CoreWeave is an exceptional employer that champions innovation and collaboration in the rapidly evolving AI landscape. With a strong commitment to employee growth, competitive benefits, and a vibrant work culture that embraces curiosity and ownership, CoreWeave provides a unique opportunity for professionals to thrive in a supportive environment. Located in a dynamic hub of technology, employees are encouraged to push boundaries and contribute to groundbreaking advancements while enjoying a flexible hybrid work model.

Dormont Manufacturing Co

Contact Details:

Dormont Manufacturing Co Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Senior Data & MLOps Engineer in London

Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.

Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to data engineering and MLOps. This gives potential employers a taste of what you can do and sets you apart from the crowd.

Tip Number 3

Prepare for interviews by brushing up on common technical questions and scenarios relevant to the role. Practice explaining your thought process and problem-solving approach, as this is often just as important as the right answer.

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our awesome team at CoreWeave.

We think you need these skills to ace Senior Data & MLOps Engineer in London

Data Engineering
MLOps
Distributed Systems
Infrastructure ML
High-Throughput Streaming Pipelines
Time-Series Feature Pipelines
Feature Store Management

Some tips for your application 🫡

Tailor Your Application:Make sure to customise your CV and cover letter to highlight the skills and experiences that align with the Senior Data & MLOps Engineer role. We want to see how your background fits into our mission at CoreWeave!

Showcase Your Projects:Include specific examples of projects you've worked on, especially those involving data pipelines, MLOps, or distributed systems. We love seeing real-world applications of your skills, so don’t hold back!

Be Clear and Concise:When writing your application, keep it straightforward and to the point. We appreciate clarity, so make sure your achievements and experiences shine through without unnecessary fluff.

Apply Through Our Website:We encourage you to submit your application directly through our website. It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it’s super easy!

How to prepare for a job interview at Dormont Manufacturing Co

Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description, like Kafka, Kubernetes, and Python. Brush up on your experience with distributed systems and MLOps practices, as these will likely come up during the interview.

Showcase Your Problem-Solving Skills

Prepare to discuss specific examples where you've tackled complex issues in data engineering or system reliability. Think about how you’ve transitioned systems from prototypes to production and be ready to explain your thought process.

Understand CoreWeave's Mission

Familiarise yourself with CoreWeave’s focus on AI and its commitment to innovation. Be prepared to discuss how your skills can contribute to their goal of building a reliable platform for AI, and show enthusiasm for their mission.

Ask Insightful Questions

Prepare thoughtful questions that demonstrate your interest in the role and the company. Inquire about their current projects, team dynamics, or how they approach challenges in GPU fleet management. This shows you’re engaged and serious about the opportunity.