Senior ML Infrastructure Engineer in Oxford

Senior ML Infrastructure Engineer in Oxford

Oxford Full-Time 43200 - 72000 ÂŁ / year (est.) No home office possible
Ellison Institute of Technology Oxford

At a Glance

  • Tasks: Build and optimise high-performance ML infrastructure for groundbreaking scientific research.
  • Company: Join the innovative Ellison Institute of Technology, focused on impactful solutions for global challenges.
  • Benefits: Enjoy enhanced holiday pay, private medical insurance, and a supportive work environment.
  • Why this job: Be part of a collaborative team driving real change in science and technology.
  • Qualifications: Experience with ML compute clusters and a proactive approach to systems design.
  • Other info: Work in a dynamic, inclusive environment that values creativity and excellence.

The predicted salary is between 43200 - 72000 ÂŁ per year.

The Ellison Institute of Technology (EIT) Oxford aims to have a global impact by fundamentally reimagining the way science and technology translate into end‑to‑end solutions and delivering these solutions through programmes and platforms that respond to humanity's most challenging problems. EIT Oxford will ensure scientific discoveries and pioneering science are turned into products that benefit society worldwide, with a long‑term vision of commercialising those solutions for sustainability.

Led by a world‑class faculty of scientists, technologists, policy makers, economists and entrepreneurs, the EIT seeks to develop and deploy commercially sustainable solutions across its four Humane Endeavours: Health, Medical Science & Generative Biology; Food Security & Sustainable Agriculture; Climate Change & Managing Atmospheric CO2; and Artificial Intelligence & Robotics. The campus, slated for completion in 2027, will span more than 300,000 sq ft of research laboratories, educational and gathering spaces, and will later expand into a 2 million sq ft campus across Oxford Science Park. Designed by Foster + Partners, it will host up to 7,000 people, featuring autonomous labs and purpose‑built facilities to spark interdisciplinary collaboration.

Requirements

Our MLOps team is building the cloud and compute foundation that enables scientific breakthroughs. We deliver reliable, secure platforms and self‑service guardrails that accelerate experimentation and turn ideas into results—faster, at scale, and with confidence. Day-to-day, you might:

  • Build, operate, and continuously optimise our high‑performance GPU training and inference clusters, focusing on robust, high‑availability scheduling, isolation, and automated lifecycle management.
  • Drive systems design and implementation for high‑throughput data paths, optimising I/O, caching, and data locality across compute and storage (including our current Lustre implementation).
  • Proactively benchmark, profile, and resolve performance bottlenecks across the compute, network, and orchestration layers to maximise efficiency for distributed training and inference.
  • Establish comprehensive observability, resilience, and automated security controls to ensure compliance and robust operation of sensitive research environments.
  • Partner with Research, Data, and Applied teams to forecast capacity and cost for GPU and storage needs, setting quotas and streamlining ML experimentation pipelines.

What makes you a great fit:

  • Proven experience leading the design, build, and operation of high‑performance ML compute clusters at scale.
  • A proactive, autonomous approach to systems design and the proven ability and desire to ideate, co‑create and implement optimal solutions.
  • Exposure to migrating or transforming ML infrastructure from traditional schedulers to modern, containerised systems.
  • Expertise with high‑throughput storage systems for ML/HPC workloads.
  • Expert‑level understanding of GPU architecture, high‑speed networking for distributed training, and performance profiling to resolve bottlenecks.
  • A solid grasp of IaC and CI/CD practices (e.g., Terraform, Argo CD).

Benefits

  • Enhanced holiday pay
  • Pension
  • Life Assurance
  • Income Protection
  • Private Medical Insurance
  • Hospital Cash Plan
  • Therapy Services
  • Perk Box
  • Electric Car Scheme

Why work for EIT

At the Ellison Institute, we believe a collaborative, inclusive team is key to our success. We are building a supportive environment where creative risks are encouraged and everyone feels heard. Valuing emotional intelligence, empathy, respect, and resilience, we encourage people to be curious and to share a commitment to excellence. Join us and make an impact!

Senior ML Infrastructure Engineer in Oxford employer: Ellison Institute of Technology Oxford

The Ellison Institute of Technology (EIT) Oxford is an exceptional employer, dedicated to fostering a collaborative and inclusive work culture that values creativity and innovation. With a focus on employee growth and well-being, EIT offers a comprehensive benefits package, including enhanced holiday pay, private medical insurance, and a supportive environment that encourages curiosity and excellence. As part of a world-class team working on groundbreaking solutions in science and technology, you will have the opportunity to make a meaningful impact on global challenges while being part of a state-of-the-art campus designed for interdisciplinary collaboration.
Ellison Institute of Technology Oxford

Contact Detail:

Ellison Institute of Technology Oxford Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Senior ML Infrastructure Engineer in Oxford

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can put in a good word for you.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects and contributions. This is a great way to demonstrate your expertise in ML infrastructure and make a lasting impression.

✨Tip Number 3

Prepare for interviews by brushing up on common technical questions and scenarios related to ML infrastructure. Practice explaining your thought process and solutions clearly, as communication is key in collaborative environments.

✨Tip Number 4

Don't forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you're genuinely interested in being part of our innovative team at EIT.

We think you need these skills to ace Senior ML Infrastructure Engineer in Oxford

High-Performance ML Compute Clusters
Systems Design
Containerised Systems
High-Throughput Storage Systems
GPU Architecture
High-Speed Networking
Performance Profiling
Infrastructure as Code (IaC)
Continuous Integration/Continuous Deployment (CI/CD)
Benchmarking
Automated Lifecycle Management
Observability and Resilience
Collaboration with Research and Data Teams
Capacity Forecasting

Some tips for your application 🫡

Tailor Your CV: Make sure your CV is tailored to the Senior ML Infrastructure Engineer role. Highlight your experience with high-performance ML compute clusters and any relevant projects that showcase your skills in systems design and implementation.

Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about the work we do at EIT Oxford and how your background aligns with our mission. Be sure to mention specific experiences that demonstrate your proactive approach.

Showcase Your Technical Skills: Don’t forget to highlight your expertise in GPU architecture, high-throughput storage systems, and IaC practices. We want to see how you’ve applied these skills in real-world scenarios, so include examples that illustrate your problem-solving abilities.

Apply Through Our Website: We encourage you to apply through our website for a smoother application process. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on joining our team!

How to prepare for a job interview at Ellison Institute of Technology Oxford

✨Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description, especially around high-performance ML compute clusters and GPU architecture. Brush up on your knowledge of IaC and CI/CD practices like Terraform and Argo CD, as these will likely come up during technical discussions.

✨Showcase Your Problem-Solving Skills

Prepare to discuss specific examples where you've identified and resolved performance bottlenecks in ML infrastructure. Use the STAR method (Situation, Task, Action, Result) to structure your answers, highlighting your proactive approach and ability to implement optimal solutions.

✨Understand Their Vision

Familiarise yourself with the Ellison Institute's mission and their four Humane Endeavours. Be ready to discuss how your skills and experiences align with their goals, particularly in areas like climate change or AI & robotics, showing that you’re not just a fit for the role but also for their vision.

✨Ask Insightful Questions

Prepare thoughtful questions that demonstrate your interest in the role and the organisation. Inquire about their current challenges in ML infrastructure or how they envision the future of their research environment. This shows you’re engaged and thinking critically about how you can contribute.

Senior ML Infrastructure Engineer in Oxford
Ellison Institute of Technology Oxford
Location: Oxford

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>