At a Glance
- Tasks: Own and enhance ML infrastructure, ensuring reliable deployment and monitoring of predictive models.
- Company: Circadia Health, a mission-driven company transforming patient monitoring with AI.
- Benefits: Competitive salary, dynamic work environment, and the chance to impact patient care.
- Other info: Join a fast-paced startup culture where your contributions directly influence patient care.
- Why this job: Make a real difference in healthcare by improving patient outcomes through innovative technology.
- Qualifications: 4+ years in MLOps or related fields, strong Python skills, and experience with AWS.
The predicted salary is between 48000 - 84000 £ per year.
As an ML Ops Engineer at Circadia Health, you will own the infrastructure and operational lifecycle of the machine learning systems that power our clinical monitoring platform. You will build and maintain the production ML pipelines, deployment infrastructure, and monitoring systems that enable Circadia's predictive models to identify early signs of clinical deterioration. Reporting to the Principal ML Engineer, you will work across ML, backend, data, and clinical teams to ensure models are reliably trained, versioned, deployed, and monitored in both cloud and edge environments. You will be a key driver in elevating Circadia's ML practice – from reproducibility and experiment tracking to CI/CD for models and operational observability. This is a high-ownership role at a lean company where production reliability, rapid iteration, and pragmatic engineering are essential. Your work will directly impact patient outcomes by ensuring our predictive models are always running, always accurate, and always improving.
Key Responsibilities
- Own and extend Circadia's ML pipeline orchestration using Apache Airflow, including training, evaluation, and deployment workflows.
- Build and maintain automated pipelines for model retraining, validation, and promotion across development, staging, and production environments.
- Implement pipeline monitoring, alerting, and failure recovery to eliminate silent failures and ensure operational reliability.
- Design pipeline architectures that support rapid experimentation while enforcing production-grade reproducibility.
- Deploy and manage ML models on AWS infrastructure (e.g. AWS Batch for batch inference workloads).
- Support deployment of models to edge devices, including Circadia's clinical monitoring hardware, working with firmware and embedded engineering teams as needed.
- Manage model versioning, promotion, and rollback workflows through the MLflow model registry.
- Evaluate and implement strategies for safe model rollouts (e.g. shadow deployments, canary releases) as the platform matures.
- Maintain and improve the MLflow-based experiment tracking and model registry infrastructure.
- Establish conventions for experiment logging, artifact storage, model metadata, and lineage tracking.
- Enable ML engineers to move seamlessly from experimentation to production deployment with minimal friction.
- Implement and maintain training data versioning and dataset management practices to ensure reproducibility of model training runs.
- Track dataset lineage, labeling provenance, and feature dependencies alongside model versions.
- Collaborate with ML engineers and data engineers to formalise dataset release and validation workflows.
- Build monitoring systems for model performance in production, including data drift detection, prediction quality tracking, and alerting on degradation.
- Implement operational dashboards for pipeline health, compute utilisation, and deployment status.
- Collaborate with data engineering to ensure upstream data quality and pipeline reliability for ML feature inputs.
- Develop incident response procedures and runbooks for ML system failures.
- Manage and optimise AWS compute resources (Batch, EC2, or similar) used for model training and inference.
- Design infrastructure-as-code solutions for reproducible ML environments.
- Drive cost optimisation across ML compute, storage, and data transfer.
- Support Snowflake integrations for feature generation and training data pipelines.
- Introduce and champion ML engineering best practices including CI/CD for models, automated testing for ML pipelines, and reproducible training workflows.
- Build internal tooling and templates that accelerate the ML development-to-production cycle.
- Document operational processes, architecture decisions, and onboarding materials for the ML platform.
- Participate in architecture discussions and technical planning to ensure ML systems scale with Circadia's growth.
- Ensure all ML pipelines and infrastructure meet healthcare security and privacy requirements, including HIPAA and SOC 2.
- Apply best practices for handling Protected Health Information (PHI) in training data, model artifacts, and inference outputs.
- Maintain audit trails for model decisions, data access, and deployment history.
Required Qualifications
- 4+ years of experience in MLOps, ML Engineering, DevOps, or a closely related infrastructure role.
- Strong proficiency in Python for ML pipeline development, tooling, and automation.
- Hands-on experience with ML pipeline orchestration tools, particularly Apache Airflow.
- Experience with model registries and experiment tracking platforms (MLflow preferred).
- Experience deploying and operating ML workloads on AWS (Batch, EC2, S3, IAM, CloudWatch).
- Solid understanding of the ML lifecycle: training, evaluation, deployment, monitoring, and retraining.
- Experience with containerisation (Docker) and infrastructure-as-code.
- Proficiency with Git and version control workflows.
- Familiarity with SQL and data warehousing platforms (Snowflake preferred).
- Experience implementing monitoring, logging, and alerting for production systems.
- Strong debugging and incident response skills for complex distributed systems.
Preferred Qualifications
- Experience deploying models to edge or embedded devices.
- Background in healthcare, medical devices, or clinical data systems.
- Familiarity with model serving frameworks (e.g., TorchServe, TF Serving, Triton, or custom solutions).
- Experience with CI/CD systems for ML (e.g., GitHub Actions, Jenkins, or similar).
- Experience with data versioning tools (e.g., DVC, LakeFS, or similar).
- Experience supporting data science or ML research teams in a production context.
- Exposure to HIPAA compliance and healthcare security best practices.
- Experience with distributed compute frameworks (e.g. Apache Spark, Dask) for large-scale data processing.
- Experience with streaming or real-time inference architectures.
What You Bring
- You take ownership of ML infrastructure end-to-end — from training pipelines to production monitoring.
- You care deeply about reliability, reproducibility, and operational excellence in ML systems.
- You have strong opinions (loosely held) on how to build a great ML platform, and you're eager to put them into practice.
- You are comfortable working in a startup environment where you'll wear multiple hats and move fast.
- You communicate clearly across engineering, data science, and clinical teams.
- You're motivated by building technology that directly improves patient care.
Why Circadia Health
Circadia Health is redefining patient monitoring through contactless sensing and AI-driven clinical insights. As we scale from tens of thousands to hundreds of thousands of monitored patients, our data infrastructure is central to everything we do.
You’ll have the opportunity to:
- Work on real-world healthcare problems with measurable patient impact
- Build data systems that power clinical-grade AI and ML
- Take ownership in a fast-growing, mission-driven company
- Collaborate with a highly skilled, multidisciplinary team
ML Ops Engineer in London employer: Circadia Health
Contact Detail:
Circadia Health Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land ML Ops Engineer in London
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.
✨Tip Number 2
Show off your skills! Create a portfolio showcasing your ML projects, pipelines, and any cool stuff you've built. This is your chance to demonstrate what you can do beyond just a CV.
✨Tip Number 3
Prepare for interviews by brushing up on your technical skills and understanding the company’s products. Be ready to discuss how you can contribute to their ML infrastructure and improve patient outcomes.
✨Tip Number 4
Apply through our website! It’s the best way to ensure your application gets seen. Plus, we love seeing candidates who are genuinely interested in joining our mission-driven team.
We think you need these skills to ace ML Ops Engineer in London
Some tips for your application 🫡
Tailor Your CV: Make sure your CV is tailored to the ML Ops Engineer role. Highlight your experience with ML pipelines, AWS, and any relevant tools like Apache Airflow. We want to see how your skills match what we're looking for!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Share your passion for ML and how you can contribute to our mission at Circadia Health. Be sure to mention specific projects or experiences that relate to the job description.
Showcase Your Projects: If you've worked on any relevant projects, whether in a professional setting or as personal endeavours, make sure to include them. We love seeing practical examples of your work, especially those that demonstrate your problem-solving skills in ML.
Apply Through Our Website: Don't forget to apply through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you're keen on joining our team at Circadia Health!
How to prepare for a job interview at Circadia Health
✨Know Your ML Ops Inside Out
Make sure you brush up on your knowledge of ML Ops principles, especially around pipeline orchestration with tools like Apache Airflow. Be ready to discuss your hands-on experience with deploying models on AWS and how you've tackled challenges in production environments.
✨Showcase Your Problem-Solving Skills
Prepare to share specific examples of how you've implemented monitoring and alerting systems in previous roles. Highlight any incidents you've managed and how you ensured operational reliability, as this will demonstrate your ability to handle real-world challenges.
✨Familiarise Yourself with Healthcare Standards
Since Circadia Health operates in the healthcare sector, it's crucial to understand HIPAA compliance and best practices for handling Protected Health Information (PHI). Be ready to discuss how you've ensured security and privacy in your past projects.
✨Communicate Clearly and Collaboratively
As you'll be working across various teams, practice articulating your thoughts clearly. Prepare to discuss how you've collaborated with data engineers and clinical teams in the past, and be ready to showcase your ability to bridge technical and non-technical conversations.