At a Glance
- Tasks: Lead innovative research in AI infrastructure and optimise GPU systems for real-world impact.
- Company: Join CoreWeave, a pioneering cloud platform for AI, trusted by top innovators.
- Benefits: Enjoy family-level medical and dental insurance, generous pension contributions, and tuition reimbursement.
- Other info: Be part of an inclusive team focused on innovative disruption and career growth.
- Why this job: Make a difference in AI reliability and performance while working with cutting-edge technology.
- Qualifications: 8+ years in machine learning or applied AI; strong Python skills required.
The predicted salary is between 70000 - 90000 £ per year.
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com. We’re proud to be a Living Wage accredited Employer.
Role Overview
We are looking for a Senior Researcher to join Monolith’s Research team, now part of CoreWeave. This is a high-impact, high-ownership role for a researcher who combines deep technical expertise in machine learning, statistical modelling, optimisation, and large-scale systems data with the ability to take complex, ambiguous problems from first principles through to production. The Monolith Data Science team is building a layered reliability and intelligence platform that shifts CoreWeave from reactive troubleshooting to proactive reliability engineering. The platform spans telemetry ingestion, feature engineering, anomaly detection, failure prediction, distributed straggler detection, performance modelling, workload optimisation, and agentic root‑cause analysis. You will work closely with Fleet, Infrastructure, AI Platform, engineering, product, and client‑facing teams to improve cluster reliability, increase effective utilisation, reduce MTTR, protect uptime, and turn large‑scale GPU infrastructure telemetry into measurable operational and commercial impact.
What You’ll Do
- Research Leadership & Strategy: Contribute meaningfully to Monolith and CoreWeave’s research direction by identifying high‑leverage problems in GPU infrastructure analytics, cluster reliability, workload performance, scheduling, and utilisation. Originate novel research directions for turning raw infrastructure telemetry into actionable intelligence. Evaluate emerging methods across statistical modelling, machine learning, observability, optimisation, simulation, reinforcement learning, anomaly detection, and autonomous diagnostics. Champion rigour, reproducibility, and scientific integrity across research outputs, experiments, prototypes, and production validation. Help establish a research foundation for understanding how large‑scale GPU systems behave, why workloads underperform, where bottlenecks emerge, and how reliability can be improved proactively.
- Technical Depth & Execution: Lead the design and development of sophisticated statistical, machine learning, and optimisation systems for large‑scale GPU infrastructure telemetry. Develop advanced models and methodologies to optimise GPU utilisation, workload scheduling, infrastructure efficiency, and system reliability. Build models and methods for anomaly detection, failure prediction, distributed straggler detection, degraded workload identification, bottleneck diagnosis, and agentic root‑cause analysis. Design experiments, analyse large‑scale system telemetry, and prototype predictive and optimisation algorithms that directly inform production systems. Drive technical decisions on difficult modelling problems involving noisy time‑series data, high‑dimensional telemetry, causal inference, uncertainty, robustness, generalisation, and out‑of‑distribution behaviour. Explore simulation, digital‑twin, reinforcement learning, and adaptive scheduling approaches where they can improve understanding or optimisation of GPU clusters and distributed training environments. Take end‑to‑end ownership of research work from problem framing and exploratory analysis through prototype development, validation, and collaboration with engineering teams on production deployment. Maintain deep personal technical expertise; remain a hands‑on contributor in Python and modern scientific computing / machine learning tooling.
- Organisational Influence & Collaboration: Serve as a strong technical voice within the research organisation, helping shape how Monolith approaches complex infrastructure intelligence problems. Work closely with Fleet, Infrastructure, AI Platform, engineering, product, and customer‑facing teams to ensure research work lands with real operational and commercial impact. Translate research findings into production‑ready prototypes, deployable solutions, and technical recommendations that improve performance, reliability, utilisation, and cost efficiency. Contribute to research practices and norms that improve how the team handles ambiguous, high‑dimensional, real‑world systems problems. Communicate complex technical work and its implications clearly to a range of audiences, from close technical collaborators to senior leadership and external stakeholders. Help build a shared understanding of how large‑scale AI infrastructure behaves, where it fails, and how it can be made more reliable, efficient, and intelligent.
Technical Focus
- Applied machine learning for GPU infrastructure and distributed systems
- Large‑scale telemetry ingestion, feature engineering, and infrastructure analytics
- GPU cluster reliability, utilisation, observability, and performance analysis
- Anomaly detection, degradation detection, and failure prediction
- Distributed straggler detection and workload performance diagnosis
- Agentic root‑cause analysis and autonomous diagnostic systems
- Time‑series, high‑dimensional, structured, and operational systems data
- Performance modelling for distributed workloads and AI training jobs
- Workload scheduling, capacity planning, forecasting, and resource allocation modelling
- Optimisation techniques including stochastic optimisation, convex optimisation, reinforcement learning, and adaptive scheduling
- Simulation and digital‑twin approaches for complex infrastructure systems
- Causal inference, controlled experiments, hypothesis testing, and statistical validation
- End‑to‑end research systems: data pipelines, prototypes, validation, deployment, and monitoring
What We’re Looking For
- 8+ years of experience, or equivalent research experience, applying statistical modelling, machine learning, optimisation, or applied AI to large‑scale datasets.
- MS or PhD in Computer Science, Statistics, Applied Mathematics, Machine Learning, Physics, Engineering, or a related quantitative field.
- Strong proficiency in Python and scientific computing libraries such as NumPy, pandas, SciPy, scikit‑learn, PyTorch, or TensorFlow.
- Experience working with large‑scale structured datasets, time‑series data, infrastructure telemetry, performance data, sensor data, or other complex operational data.
- Experience designing and analysing controlled experiments, including A/B testing, hypothesis testing, causal inference, or rigorous model validation.
- Experience building and validating predictive models in production or research environments.
- Experience with distributed data systems such as Spark, Ray, Dask, or similar.
- Proficiency in SQL and working with large‑scale structured data.
- Strong understanding of optimisation techniques such as linear programming, convex optimisation, stochastic optimisation, reinforcement learning, or adaptive scheduling.
- Demonstrated ability to solve ambiguous technical problems where the right approach is not already known.
- Ability to translate research findings into production‑ready prototypes, deployable workflows, or operational tooling.
- Strong scientific judgement, including experimental design, reproducibility, validation, and awareness of uncertainty.
- The ability to communicate clearly and influence across research, engineering, product, infrastructure, and leadership audiences.
Preferred Experience
- PhD with published research in systems optimisation, distributed computing, ML systems, performance modelling, reliability engineering, scientific computing, or a related area.
- Experience with GPU workloads, distributed training, AI infrastructure, HPC, or large‑scale compute environments.
- Familiarity with Kubernetes, containerised workloads, cloud‑native systems, or distributed infrastructure.
- Experience developing reinforcement learning, adaptive scheduling, autonomous diagnostics, or agentic systems.
- Background in capacity planning, forecasting, resource allocation modelling, or infrastructure efficiency.
- Experience with observability, hardware telemetry, performance monitoring, root cause analysis, or failure prediction.
- Contributions to open‑source machine learning, systems, infrastructure, or scientific computing projects.
What We Offer
- Family‑level Medical Insurance
- Family‑level Dental Insurance
- Generous Pension Contribution
- Life Assurance at 4x Salary
- Critical Illness Cover
- Employee Assistance Programme
- Tuition Reimbursement
- Work culture focused on innovative disruption
Equal Opportunity
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
Senior Researcher employer: CoreWeave
CoreWeave is an exceptional employer, offering a dynamic work culture that prioritises innovation and collaboration in the rapidly evolving field of AI. With a commitment to employee growth through comprehensive benefits like family-level medical and dental insurance, generous pension contributions, and tuition reimbursement, CoreWeave fosters an environment where researchers can thrive and make impactful contributions to cutting-edge technology. Located in a vibrant tech hub, employees enjoy the unique advantage of being part of a pioneering team dedicated to transforming GPU infrastructure reliability and performance.
StudySmarter Expert Advice🤫
We think this is how you could land Senior Researcher
✨Get Involved in Data Science Meetups
Tap into local data science meetups or workshops to connect with fellow enthusiasts and professionals. These events are goldmines for networking, and sometimes even lead directly to job openings at companies like CoreWeave!
✨Show Off Your Projects
Start building a public portfolio showcasing your data science projects on platforms like GitHub or personal websites. Highlight unique analyses or models you've developed. This not only demonstrates your skills but also gets your name out there for roles like Senior Researcher at CoreWeave.
✨Leverage Professional Networks
Join professional bodies related to data science, like the Data Science Society or similar organisations. Getting involved can lead to mentorship opportunities and insider knowledge about full-time positions at companies like CoreWeave.
✨Apply Directly through Our Website
When you find a suitable opening like Senior Researcher at CoreWeave, make sure to apply directly through our website. It gives you an edge and shows you're keen to join our team. Plus, who doesn’t love a direct application? It’s easier than navigating through job boards!
We think you need these skills to ace Senior Researcher
Some tips for your application 🫡
Show Off Your Projects:In the world of data science, your projects can speak volumes about your skills. Make sure to showcase a few key projects in your CV or portfolio, especially those that highlight your ability to work with data sets, build models, or use relevant tools like Python, R, or SQL. Don’t forget to include links to any GitHub repositories if applicable!
Quantify Your Achievements:Employers love numbers! When drafting your CV, highlight your achievements with quantifiable results. For instance, mention how your data analysis led to a certain percentage increase in efficiency or revenue at a previous job or project. These details can really make your application pop!
Craft a Tailored Cover Letter:For a full-time role at CoreWeave, your cover letter should reflect your passion for data science and your excitement about the specific projects or values of the company. Dive into why you’re a good fit, how your skills align with their needs, and any unique perspectives you can bring to the team.
Stand Out with Relevant Courses and Certifications:Although experience talks, relevant courses or certifications can be your ticket to impressing hiring managers at CoreWeave. Mention any standout courses you've completed that equipped you with essential skills, such as machine learning certifications or data visualisation courses. This shows your commitment to continuously developing your skills in the field!
How to prepare for a job interview at CoreWeave
✨Brush Up on Your Statistics
For a data science role, we need to seriously sharpen our statistics skills. Get ready to tackle technical questions on probability distributions, hypothesis testing, and regression analysis. These are often the bread and butter of data science interviews, so don't just skim over them!
✨Showcase Your Projects
Prepare a killer portfolio showcasing your data science projects. We should include details about the datasets used, the tools and techniques applied, and the impact of your findings. If we can walk them through a particularly challenging project or a cool visualisation that had real-world implications, it’ll really make us stand out!
✨Get Comfortable with Python and R
Most data science positions require us to be proficient in programming languages like Python and R. We should practice common libraries like pandas, NumPy, and scikit-learn, and be ready for live coding exercises or algorithm questions. Showing off our coding chops can really impress the interviewers at CoreWeave!
✨Prepare for Case Studies
Expect to encounter real-world case studies during the interview. We might be asked how we’d approach a data problem or analyse a dataset to extract insights. It's essential to think out loud and demonstrate our problem-solving process so that the interviewer can see our logical thinking in action.