ML Data Engineer

ML Data Engineer

Full-Time 48000 - 84000 £ / year (est.) Home office (partial)
Go Premium
Moonvalley AI

At a Glance

  • Tasks: Build data pipelines for next-gen generative video models and tackle data quality challenges.
  • Company: Join Moonvalley, a pioneering AI studio creating award-winning media experiences with top industry talent.
  • Benefits: Enjoy hybrid work options, flexible hours, and the chance to collaborate with elite professionals.
  • Other info: Expect a fast-paced environment with occasional late nights; commitment is key to our mission.
  • Why this job: Work on groundbreaking technology that shapes the future of entertainment while solving complex problems.
  • Qualifications: Strong ML engineering experience, proficiency in Python, and familiarity with cloud infrastructure required.

The predicted salary is between 48000 - 84000 £ per year.

Moonvalley is developing cutting-edge generative AI models designed to power Superbowl-worthy commercials and award-winning cinematic experiences. Our inaugural, cutting-edge HD model, Marey, is built on exclusively licensed and owned data for professional use in Hollywood and enterprise applications.

Our team is an unprecedented convergence of talent across industries. Our elite AI scientists from DeepMind, Microsoft, Snap and Meta, have decades of collective experience in machine learning and computational creativity. We have also established the first AI-enabled movie studio in Hollywood, filled with accomplished filmmakers and visionary creative talent. We work with the top producers, actors, and filmmakers in Hollywood as well as creative-driven global brands. So far we\’ve raised over $70M from world-class investors including General Catalyst, Bessemer, Khosla Ventures & YCombinator – and we\’re just getting started.

Role Summary:

We\’re looking for an ML Data Engineer to build the data pipelines driving our next-generation generative video models. This role is central to our mission of training models exclusively on clean, high-quality data.

You\’ll develop data ingestion pipelines, captioning systems, and high-throughput, distributed architectures for large-scale data processing and curation. You\’ll be responsible for solving some of the toughest challenges in data quality and model performance – from training and shipping quality scoring models to analyzing large-scale datasets and uncovering new challenges

What you\’ll do:

  • Design and implement systems for data ingestion, deduplication, validation, filtering, labelling, and quality scoring.
  • Fine-tune and build ML models from scratch and take them from training to production.
  • Identify and address dataset/model biases – including creating additional scoring systems to mitigate them.
  • Implement observability and telemetry across the ML data lifecycle.
  • Collaborate with infrastructure teams to develop efficient data pipelines that support large-scale video model training, running across thousands of GPUs.
  • Work in a fast-moving environment with many known and unknown challenges to tackle.

What we\’re looking for:

  • Strong hands-on experience in ML engineering, including training and optimizing models (e.g., classifiers, segmentation, quality scoring), with a focus on image, video, or audio modalities.
  • Deep experience in building and scaling data infrastructure for large-scale ML systems, ideally for video or multi-modal models.
  • Experience managing large-scale datasets and pipelines in production.
  • Fluency with Python, Spark, Airflow, or similar frameworks.
  • Understanding of modern cloud infrastructure: Kubernetes, Terraform, S3/GCS, distributed compute.
  • Comfortable operating in environments with ambiguity and evolving priorities.

Nice to Haves:

  • Experience working on foundational model training pipelines (image, video, or language).
  • Experience with video-specific data challenges like frame sampling, codec variability, temporal alignment, and perceptual quality scoring.

In our team, we approach our work with the dedication similar to Olympic athletes. Anticipate occasional late nights and weekends dedicated to our mission. We understand this level of commitment may not suit everyone, and we openly communicate this expectation.

If you\’re motivated by deeply technical problems, a seemingly never-ending uphill battle and the opportunity to build (and own) a generational technology company, we can give you what you\’re looking for.

All business roles at Moonvalley are hybrid positions by default, with some fully remote depending on the job scope. We meet a few times every year, usually in London, UK or North America (LA, Toronto) as a company.

If you\’re excited about the opportunity to work on cutting-edge AI technology and help shape the future of media and entertainment, we encourage you to apply. We look forward to hearing from you!

The statements contained in this job description reflect general details as necessary to describe the principal functions of this job, the level of knowledge and skill typically required and the scope of responsibility. It should not be considered an all-inclusive listing of work requirements. Individuals may perform other duties as assigned, including work in other functional areas to cover absences, to equalize peak work periods, or to otherwise balance organizational work

Moonvalley AI is proud to be an equal opportunity employer. We are committed to providing accommodations. If you require accommodation, we will work with you to meet your needs.

Please be assured we\’ll treat any information you share with us with the utmost care, only use your information for recruitment purposes and will never sell it to other companies for marketing purposes. Please review our privacy policy and job applicant privacy policy located here for further information. #J-18808-Ljbffr

ML Data Engineer employer: Moonvalley AI

At Moonvalley, we pride ourselves on being at the forefront of AI innovation, offering a dynamic work environment where creativity meets cutting-edge technology. Our hybrid work culture fosters collaboration among elite professionals from top tech companies, providing ample opportunities for personal and professional growth while tackling some of the most challenging problems in the industry. With our unique position in Hollywood and a commitment to excellence, we empower our employees to make a significant impact in the world of media and entertainment.
Moonvalley AI

Contact Detail:

Moonvalley AI Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land ML Data Engineer

✨Tip Number 1

Familiarise yourself with the latest advancements in generative AI and video processing. Understanding the current trends and technologies will not only help you during interviews but also demonstrate your genuine interest in the field.

✨Tip Number 2

Network with professionals in the AI and film industries. Attend relevant meetups, webinars, or conferences to connect with people who work at Moonvalley or similar companies. This can provide valuable insights and potentially lead to referrals.

✨Tip Number 3

Showcase your hands-on experience with ML engineering by working on personal projects or contributing to open-source initiatives. Highlighting your practical skills in building data pipelines and optimising models can set you apart from other candidates.

✨Tip Number 4

Prepare for technical interviews by brushing up on your knowledge of Python, Spark, and cloud infrastructure. Practising coding challenges and system design problems related to data ingestion and processing will help you feel more confident during the interview process.

We think you need these skills to ace ML Data Engineer

Machine Learning Engineering
Data Pipeline Development
Data Ingestion and Deduplication
Model Training and Optimisation
Large-Scale Data Management
Python Programming
Apache Spark
Apache Airflow
Cloud Infrastructure (Kubernetes, Terraform, S3/GCS)
Distributed Computing
Data Quality Assessment
Bias Identification and Mitigation
Observability and Telemetry Implementation
Collaboration with Infrastructure Teams
Problem-Solving in Ambiguous Environments

Some tips for your application 🫡

Understand the Role: Before applying, make sure you fully understand the responsibilities and requirements of the ML Data Engineer position at Moonvalley. Tailor your application to highlight relevant experience in building data pipelines and working with large-scale datasets.

Highlight Relevant Skills: In your CV and cover letter, emphasise your hands-on experience with machine learning engineering, particularly in image, video, or audio modalities. Mention specific tools and frameworks like Python, Spark, and Airflow that you are proficient in.

Showcase Problem-Solving Abilities: Moonvalley is looking for candidates who can tackle complex challenges. Use examples from your past work to demonstrate how you've solved data quality issues or improved model performance, especially in ambiguous environments.

Craft a Compelling Cover Letter: Write a cover letter that not only outlines your qualifications but also expresses your passion for AI technology and the media industry. Make it clear why you want to be part of Moonvalley and how you can contribute to their mission.

How to prepare for a job interview at Moonvalley AI

✨Showcase Your Technical Skills

Be prepared to discuss your hands-on experience with ML engineering. Highlight specific projects where you've trained and optimised models, especially in image, video, or audio modalities. Demonstrating your fluency with tools like Python, Spark, and Airflow will impress the interviewers.

✨Understand Data Infrastructure

Since the role involves building and scaling data infrastructure, make sure you can talk about your experience managing large-scale datasets and pipelines. Familiarise yourself with modern cloud infrastructure concepts like Kubernetes and Terraform, as these are crucial for the position.

✨Prepare for Problem-Solving Questions

Expect to face questions that assess your ability to tackle ambiguous challenges. Think of examples from your past work where you successfully navigated uncertainty or evolving priorities, and be ready to explain your thought process in detail.

✨Demonstrate Passion for AI and Media

Moonvalley is at the forefront of AI technology in media and entertainment. Show your enthusiasm for the industry by discussing recent trends or innovations in generative AI. This will help convey your genuine interest in contributing to their mission.

ML Data Engineer
Moonvalley AI
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>