Senior Software Engineer (Machine Learning Quality Assurance) in Cambridge

Cambridge Full-Time No working from home possible

Apply Now

Requirements

Applicants for this role should have strong experience designing, developing, and maintaining high-quality software systems
The ideal candidate is an experienced software engineer who values code quality, testability, and long-term maintainability, and enjoys building systems that other engineers rely on
This person will be comfortable working across large codebases, contributing to CI/CD infrastructure, and shaping technical direction through thoughtful design and mentoring in a technically demanding environment spanning ML frameworks, infrastructure, and AI accelerator hardware
Experience in production-quality software engineering roles
Strong software design and architecture skills, with experience working on large or complex systems
Strong proficiency in Python, including experience building and maintaining production codebases
Solid experience with CI/CD systems and automated testing (preferably GitHub-based workflows)
Experience working in Linux environments
Familiarity with C or C++, with the ability to read, debug, and reason about low-level code when needed
Proven ability to mentor junior engineers and influence engineering practices within a team
Strong problem-solving skills and a proactive, self-directed approach to work
Bachelor/Master's/PhD or equivalent experience in Computer Science, Maths, Machine Learning, Data Science, or related field
(Desirable) Exposure to machine learning frameworks such as PyTorch, JAX, Triton,TensorFlow
(Desirable) Experience with distributed workload management systems such asKubernetes, VLLM, Keras or MLOpspipelines
(Desirable) Experience working with hardware simulators or emulators (e.g. QEMU)
(Desirable) Experience developing for or working with FPGA-based systems
(Desirable) Experience with people management or mentoring

What the job involves

The role focuses on testing and validating a complex machine learning software stack, with particular emphasis on software architecture, automation, and engineering best practices
The ML QA team is composed of highly skilled software engineers with a strong focus on automation, software quality, and data-driven validation. The team works closely with industry-standard machine learning frameworks and models, contributing to upstream open-source projects and collaborating across the wider software organization
Operating in a fast-paced environment, the team plays a critical role in ensuring reliability, performance, and maintainability across the ML software stack, helping to deliver robust and high-quality products to customers
Design, implement, and maintain robust test infrastructure and automation for a complex ML software stack
Architect and evolve test frameworks and tooling with a focus on scalability, maintainability, and developer experience
Build and maintain CI/CD pipelines targeting simulators, emulators (e.g. QEMU), and physical hardware
Create representativeML workloadsand gain insights from their execution. (Numerical accuracy, performance analysis and benchmarking)
Work closely with all Software development teams, supporting a culture of quality, security and maintainability
Review code and designs, setting a high bar for software engineering best practices
Mentor and support junior engineers, helping raise the overall technical capability of the team
Evaluate existing test strategies and infrastructure, identifying gaps and driving improvements aligned with team and organizational goals

#J-18808-Ljbffr

Contact Details:

graphcore Recruitment Team

View graphcore profile

Senior Software Engineer (Machine Learning Quality Assurance) in Cambridge

graphcore

Location: Cambridge

Apply Now

Senior Software Engineer (Machine Learning Quality Assurance) in Cambridge

Company

Product

Help