Requirements
- Experience in production-quality software engineering roles
- Strong software design and architecture skills, with experience working on large or complex systems
- Strong proficiency in Python, including experience building and maintaining production codebases
- Solid experience with CI/CD systems and automated testing (preferably GitHub-based workflows)
- Experience working in Linux environments
- Familiarity with C or C++, with the ability to read, debug, and reason about low‑level code when needed
- Proven ability to mentor junior engineers and influence engineering practices within a team
- Strong problem‑solving skills and a proactive, self‑directed approach to work
- Bachelor/Master's/PhD or equivalent experience in Computer Science, Maths, Machine Learning, Data Science, or related field
- (Desirable) Exposure to machine learning frameworks such as PyTorch, JAX, Triton,TensorFlow
- (Desirable) Experience with distributed workload management systems such asKubernetes, VLLM, Keras or MLOpspipelines
- (Desirable) Experience working with hardware simulators or emulators (e.g. QEMU)
- (Desirable) Experience developing for or working with FPGA-based systems
- (Desirable) Experience with people management or mentoring
What the job involves
- Applicants for this role should have strong experience designing, developing, and maintaining high‑quality software systems
- The role focuses on testing and validating a complex machine learning software stack, with particular emphasis on software architecture, automation, and engineering best practices
- The ideal candidate is an experienced software engineer who values code quality, testability, and long‑term maintainability, and enjoys building systems that other engineers rely on
- This person will be comfortable working across large codebases, contributing to CI/CD infrastructure, and shaping technical direction through thoughtful design and mentoring in a technically demanding environment spanning ML frameworks, infrastructure, and AI accelerator hardware
- Design, implement, and maintain robust test infrastructure and automation for a complex ML software stack
- Architect and evolve test frameworks and tooling with a focus on scalability, maintainability, and developer experience
- Build and maintain CI/CD pipelines targeting simulators, emulators (e.g. QEMU), and physical hardware
- Create representativeML workloads and gain insights from their execution. (Numerical accuracy, performance analysis and benchmarking)
- Work closely with all Software development teams, supporting a culture of quality, security and maintainability
- Review code and designs, setting a high bar for software engineering best practices
- Mentor and support junior engineers, helping raise the overall technical capability of the team
- Evaluate existing test strategies and infrastructure, identifying gaps and driving improvements aligned with team and organizational goals