At a Glance
- Tasks: Drive research in reinforcement learning and develop innovative agent systems for real-world applications.
- Company: Join Canva, a vibrant tech company redefining design experiences globally.
- Benefits: Equity packages, flexible leave, inclusive parental policies, and a wellbeing allowance.
- Why this job: Make a real impact with cutting-edge AI technology while collaborating with passionate teams.
- Qualifications: Experience in reinforcement learning, post-training, and strong Python skills required.
- Other info: Dynamic work environment with opportunities for personal and professional growth.
The predicted salary is between 48000 - 84000 ÂŁ per year.
Company Description
Join the team redefining how the world experiences design. We know job hunting can be a little time consuming and you’re probably keen to find out what’s on offer, so we’ll get straight to the point.
Where and how you can work
The buzzing Canva London campus features several buildings around beautiful leafy Hoxton Square in Shoreditch. While our global headquarters is in Sydney, Australia, London is our HQ for Europe, with all kinds of teams based here, plus event spaces to gather our team and communities. You’ll experience a warm welcome from our Vibe team at front of house, amazing home cooked food from our Head Chef and a variety of workspaces to hang out with your team mates or get solo work done. That said, we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals and so you have choice in where and how you work.
Job Description
At Canva, our mission is to empower the world to design. We’re building AI that feels magical and lands real impact for millions of people - helping anyone create with confidence. We’re looking for a senior research scientist who lives and breathes reinforcement learning, agentic systems and mixture of expert models to push the frontier of reasoning, tool use, latency and reliability - and ship it to users.
About the team
We explore multimodal agentic architectures, build scalable training and evaluation loops, and partner closely with product and platform teams to turn breakthroughs into delightful product features. We are a cutting-edge post-training team, developing new multimodal agentic systems. We work on all topics of multimodal modelling, post-training and design agents, we build scalable training and evaluation loops, and partner closely with product and platform teams to turn breakthroughs into delightful product features. We are looking for a person with experience in post-training, reinforcement learning (RL) and mixture of expert models to join our team.
About the role
You’ll drive research directions and play a leading role in hands‑on work across the agent stack—from reward design and policy optimization to planning, memory, and tool orchestration, dataset construction, to post-training, and the development of novel post-training approaches. You’ll design tight experiments, iterate quickly, and land trustworthy conclusions. Most importantly, you’ll help convert research into reliable, safe, and high‑quality product experiences.
What you’ll do
- Develop agent systems (planning, multimodal tool use, retrieval, novel training approaches, modeling ablations) for real tasks in design, vision, and language.
- Scale post-training and RL across distributed systems (PyTorch) with efficient data loaders, tracing/telemetry, stable training of mixture-of-experts (MoE) architectures, and reproducible pipelines; profile, debug, and optimize.
- Contribute to the research agenda for RL/agentic systems aligned with Canva’s product goals; identify high‑leverage bets and retire dead ends quickly.
- Build reward models and learning loops: RLHF/RLAIF, preference modeling, DPO/IPO‑style objectives, offline/online RL, curriculum learning, and credit assignment for multi‑step reasoning.
- Develop simulation and sandbox tasks that surface failure modes (planning errors, tool‑use brittleness, hallucination, unsafe actions) and turn them into measurable targets.
- Help align on rigorous evaluation for agents (task success, reliability, latency, safety, regressions).
- Stand up offline suites and online A/B tests; favor simple, controlled experiments that generalize.
- Collaborate and ship: work shoulder‑to‑shoulder with product, design, safety, and platform to land research as reliable features—then iterate.
- Share and elevate: mentor teammates, present findings internally, and contribute back to the community when it helps the field and our users.
You’re likely a match if you have:
- Depth in implementing and post-training MoEs/LLMs/VLMs/Diffusion models, with a track record of shipped research or publications in MoEs, RL or agents.
- Experience modifying, and adapting open‑source models.
- Strong experience with experimental design: tight baselines, clean ablations, reproducibility, and clear, data‑backed conclusions.
- Fluency in Python and PyTorch; you’re comfortable in large ML codebases and can profile, debug, and optimize training and inference.
- Practical experience building agent loops (planning, tool invocation, retrieval, memory) and evaluating multi‑step reasoning quality.
- Hands‑on experience with policy optimization, reward modeling, and preference learning (e.g., RLHF/RLAIF, DPO/IPO, actor‑critic/PPO, offline RL).
- Experience with large‑scale training (distributed training, experiment tracking, evaluation harnesses) and cloud multimodal tooling.
- Experience with RL for MoE architectures.
Nice to have:
- Experience with video and audio modelling.
- Experience with multi‑agent settings.
- Strength in alignment and safety evaluations, including red‑teaming and risk mitigation for tool‑using agents.
- Contributions to open‑source, benchmarks, or shared evaluation suites for agents.
Additional Information
What’s in it for you? Achieving our crazy big goals motivates us to work hard - and we do - but you’ll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too. We also offer a range of benefits to set you up for every success in and outside of work. Here’s a taste of what’s on offer:
- Equity packages - we want our success to be yours too.
- Inclusive parental leave policy that supports all parents & carers.
- An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more.
- Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally.
Check out lifeatcanva.com for more info.
Other stuff to know
We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture. When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process. We celebrate all types of skills and backgrounds at Canva so even if you don’t feel like your skills quite match what’s listed above - we still want to hear from you! Please note that interviews are conducted virtually.
Senior Research Scientist - Reinforcement Learning, MoEs employer: black.ai
Contact Detail:
black.ai Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Senior Research Scientist - Reinforcement Learning, MoEs
✨Tip Number 1
Network like a pro! Reach out to people in the industry, especially those at Canva. A friendly chat can open doors and give you insights that a job description just can't.
✨Tip Number 2
Prepare for your interview by diving deep into reinforcement learning and agentic systems. Brush up on your knowledge and be ready to discuss your past projects and how they relate to what Canva is doing.
✨Tip Number 3
Showcase your passion! When you get the chance to speak with the team, let your enthusiasm for design and AI shine through. They want to see that you’re not just qualified, but genuinely excited about the work.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows you’re serious about joining the Canva family.
We think you need these skills to ace Senior Research Scientist - Reinforcement Learning, MoEs
Some tips for your application 🫡
Be Yourself: When you're writing your application, let your personality shine through! We want to get to know the real you, so don’t be afraid to show your passion for reinforcement learning and how it aligns with our mission at Canva.
Tailor Your Application: Make sure to customise your application to highlight your experience with MoEs and RL. Use specific examples from your past work that demonstrate your skills and how they can contribute to our team’s goals.
Showcase Your Achievements: Don’t hold back on sharing your successes! Whether it's a project you led or a paper you published, we want to see what you've accomplished in the field of AI and how it can benefit our innovative environment.
Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy!
How to prepare for a job interview at black.ai
✨Know Your Reinforcement Learning Inside Out
Make sure you brush up on your knowledge of reinforcement learning, especially in the context of mixture of expert models. Be ready to discuss your past experiences and how they relate to the role. Prepare to explain complex concepts in simple terms, as this shows your depth of understanding.
✨Showcase Your Experimental Design Skills
Be prepared to talk about your experience with experimental design, including how you've set up tight baselines and clean ablations. Bring examples of your work that demonstrate your ability to draw clear, data-backed conclusions. This will highlight your analytical skills and attention to detail.
✨Familiarise Yourself with Their Tech Stack
Since the role involves working with Python and PyTorch, make sure you're comfortable discussing your experience with these technologies. If you’ve worked on large ML codebases or have experience with distributed training, be ready to share specific examples of your contributions.
✨Prepare for Collaboration Questions
Canva values collaboration, so think about times when you've worked closely with product, design, or safety teams. Be ready to discuss how you’ve turned research into reliable features and how you handle feedback. This will show that you can thrive in a team-oriented environment.