Requirements
- Applicants for this role should have strong experience designing, developing, and maintaining high-quality software systems
- The ideal candidate is an experienced software engineer who values code quality, testability, and long-term maintainability, and enjoys building systems that other engineers rely on
- This person will be comfortable working across large codebases, contributing to CI/CD infrastructure, and shaping technical direction through thoughtful design and mentoring in a technically demanding environment spanning ML frameworks, infrastructure, and AI accelerator hardware
- Experience in production-quality software engineering roles
- Strong software design and architecture skills, with experience working on large or complex systems
- Strong proficiency in Python, including experience building and maintaining production codebases
- Solid experience with CI/CD systems and automated testing (preferably GitHub-based workflows)
- Experience working in Linux environments
- Familiarity with C or C++, with the ability to read, debug, and reason about low-level code when needed
- Proven ability to mentor junior engineers and influence engineering practices within a team
- Strong problem-solving skills and a proactive, self-directed approach to work
- Bachelor/Master's/PhD or equivalent experience in Computer Science, Maths, Machine Learning, Data Science, or related field
- (Desirable) Exposure to machine learning frameworks such as PyTorch, JAX, Triton,TensorFlow
- (Desirable) Experience with distributed workload management systems such asKubernetes, VLLM, Keras or MLOpspipelines
- (Desirable) Experience working with hardware simulators or emulators (e.g. QEMU)
- (Desirable) Experience developing for or working with FPGA-based systems
- (Desirable) Experience with people management or mentoring
What the job involves
- The role focuses on testing and validating a complex machine learning software stack, with particular emphasis on software architecture, automation, and engineering best practices
- The ML QA team is composed of highly skilled software engineers with a strong focus on automation, software quality, and data-driven validation. The team works closely with industry-standard machine learning frameworks and models, contributing to upstream open-source projects and collaborating across the wider software organization
- Operating in a fast-paced environment, the team plays a critical role in ensuring reliability, performance, and maintainability across the ML software stack, helping to deliver robust and high-quality products to customers
- Design, implement, and maintain robust test infrastructure and automation for a complex ML software stack
- Architect and evolve test frameworks and tooling with a focus on scalability, maintainability, and developer experience
- Build and maintain CI/CD pipelines targeting simulators, emulators (e.g. QEMU), and physical hardware
- Create representativeML workloadsand gain insights from their execution. (Numerical accuracy, performance analysis and benchmarking)
- Work closely with all Software development teams, supporting a culture of quality, security and maintainability
- Review code and designs, setting a high bar for software engineering best practices
- Mentor and support junior engineers, helping raise the overall technical capability of the team
- Evaluate existing test strategies and infrastructure, identifying gaps and driving improvements aligned with team and organizational goals