Job Board

Companies

black.ai

Senior Research Scientist - Reinforcement Learning, MoEs

Full-Time 48000 - 84000 £ / year (est.) No home office possible

Apply now

At a Glance

Tasks: Drive research in reinforcement learning and develop innovative agent systems for real-world applications.
Company: Join Canva, a vibrant tech company redefining design experiences globally.
Benefits: Equity packages, flexible leave, inclusive parental policies, and a wellbeing allowance.
Why this job: Make a real impact with cutting-edge AI technology while collaborating with passionate teams.
Qualifications: Experience in reinforcement learning, post-training, and strong Python skills required.
Other info: Dynamic work environment with opportunities for personal and professional growth.

The predicted salary is between 48000 - 84000 £ per year.

Company Description

Join the team redefining how the world experiences design. We know job hunting can be a little time consuming and you’re probably keen to find out what’s on offer, so we’ll get straight to the point.

Where and how you can work

The buzzing Canva London campus features several buildings around beautiful leafy Hoxton Square in Shoreditch. While our global headquarters is in Sydney, Australia, London is our HQ for Europe, with all kinds of teams based here, plus event spaces to gather our team and communities. You’ll experience a warm welcome from our Vibe team at front of house, amazing home cooked food from our Head Chef and a variety of workspaces to hang out with your team mates or get solo work done. That said, we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals and so you have choice in where and how you work.

Job Description

At Canva, our mission is to empower the world to design. We’re building AI that feels magical and lands real impact for millions of people - helping anyone create with confidence. We’re looking for a senior research scientist who lives and breathes reinforcement learning, agentic systems and mixture of expert models to push the frontier of reasoning, tool use, latency and reliability - and ship it to users.

About the team

We explore multimodal agentic architectures, build scalable training and evaluation loops, and partner closely with product and platform teams to turn breakthroughs into delightful product features. We are a cutting-edge post-training team, developing new multimodal agentic systems. We work on all topics of multimodal modelling, post-training and design agents, we build scalable training and evaluation loops, and partner closely with product and platform teams to turn breakthroughs into delightful product features. We are looking for a person with experience in post-training, reinforcement learning (RL) and mixture of expert models to join our team.

About the role

You’ll drive research directions and play a leading role in hands‑on work across the agent stack—from reward design and policy optimization to planning, memory, and tool orchestration, dataset construction, to post-training, and the development of novel post-training approaches. You’ll design tight experiments, iterate quickly, and land trustworthy conclusions. Most importantly, you’ll help convert research into reliable, safe, and high‑quality product experiences.

What you’ll do

Develop agent systems (planning, multimodal tool use, retrieval, novel training approaches, modeling ablations) for real tasks in design, vision, and language.
Scale post-training and RL across distributed systems (PyTorch) with efficient data loaders, tracing/telemetry, stable training of mixture-of-experts (MoE) architectures, and reproducible pipelines; profile, debug, and optimize.
Contribute to the research agenda for RL/agentic systems aligned with Canva’s product goals; identify high‑leverage bets and retire dead ends quickly.
Build reward models and learning loops: RLHF/RLAIF, preference modeling, DPO/IPO‑style objectives, offline/online RL, curriculum learning, and credit assignment for multi‑step reasoning.
Develop simulation and sandbox tasks that surface failure modes (planning errors, tool‑use brittleness, hallucination, unsafe actions) and turn them into measurable targets.
Help align on rigorous evaluation for agents (task success, reliability, latency, safety, regressions).
Stand up offline suites and online A/B tests; favor simple, controlled experiments that generalize.
Collaborate and ship: work shoulder‑to‑shoulder with product, design, safety, and platform to land research as reliable features—then iterate.
Share and elevate: mentor teammates, present findings internally, and contribute back to the community when it helps the field and our users.

You’re likely a match if you have:

Depth in implementing and post-training MoEs/LLMs/VLMs/Diffusion models, with a track record of shipped research or publications in MoEs, RL or agents.
Experience modifying, and adapting open‑source models.
Strong experience with experimental design: tight baselines, clean ablations, reproducibility, and clear, data‑backed conclusions.
Fluency in Python and PyTorch; you’re comfortable in large ML codebases and can profile, debug, and optimize training and inference.
Practical experience building agent loops (planning, tool invocation, retrieval, memory) and evaluating multi‑step reasoning quality.
Hands‑on experience with policy optimization, reward modeling, and preference learning (e.g., RLHF/RLAIF, DPO/IPO, actor‑critic/PPO, offline RL).
Experience with large‑scale training (distributed training, experiment tracking, evaluation harnesses) and cloud multimodal tooling.
Experience with RL for MoE architectures.

Nice to have:

Experience with video and audio modelling.
Experience with multi‑agent settings.
Strength in alignment and safety evaluations, including red‑teaming and risk mitigation for tool‑using agents.
Contributions to open‑source, benchmarks, or shared evaluation suites for agents.

Additional Information

What’s in it for you? Achieving our crazy big goals motivates us to work hard - and we do - but you’ll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too. We also offer a range of benefits to set you up for every success in and outside of work. Here’s a taste of what’s on offer:

Equity packages - we want our success to be yours too.
Inclusive parental leave policy that supports all parents & carers.
An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more.
Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally.

Check out lifeatcanva.com for more info.

Other stuff to know

We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture. When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process. We celebrate all types of skills and backgrounds at Canva so even if you don’t feel like your skills quite match what’s listed above - we still want to hear from you! Please note that interviews are conducted virtually.

Senior Research Scientist - Reinforcement Learning, MoEs employer: black.ai

At Canva, we pride ourselves on fostering a vibrant and inclusive work culture that empowers our Canvanauts to thrive both personally and professionally. Located in the heart of Shoreditch, our London campus offers a dynamic environment with flexible working options, exceptional benefits including equity packages and a supportive parental leave policy, and ample opportunities for growth through collaboration and mentorship. Join us in redefining design while enjoying a workplace that values creativity, well-being, and community.

Contact Detail:

black.ai Recruiting Team

View black.ai Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Senior Research Scientist - Reinforcement Learning, MoEs

✨Tip Number 1

Network like a pro! Reach out to people in the industry, especially those at Canva. A friendly chat can open doors and give you insights that a job description just can't.

✨Tip Number 2

Prepare for your interview by diving deep into reinforcement learning and agentic systems. Brush up on your knowledge and be ready to discuss your past projects and how they relate to what Canva is doing.

✨Tip Number 3

Showcase your passion! When you get the chance to speak with the team, let your enthusiasm for design and AI shine through. They want to see that you’re not just qualified, but genuinely excited about the work.

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows you’re serious about joining the Canva family.

We think you need these skills to ace Senior Research Scientist - Reinforcement Learning, MoEs

Reinforcement Learning (RL)

Mixture of Expert Models (MoEs)

Post-training Techniques

Experimental Design

Python

PyTorch

Policy Optimization

Reward Modeling

Multi-step Reasoning Evaluation

Distributed Systems

Data Analysis

Collaboration Skills

Mentoring

Simulation Development

Safety Evaluations

Some tips for your application 🫡

Be Yourself: When you're writing your application, let your personality shine through! We want to get to know the real you, so don’t be afraid to show your passion for reinforcement learning and how it aligns with our mission at Canva.

Tailor Your Application: Make sure to customise your application to highlight your experience with MoEs and RL. Use specific examples from your past work that demonstrate your skills and how they can contribute to our team’s goals.

Showcase Your Achievements: Don’t hold back on sharing your successes! Whether it's a project you led or a paper you published, we want to see what you've accomplished in the field of AI and how it can benefit our innovative environment.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy!

How to prepare for a job interview at black.ai

✨Know Your Reinforcement Learning Inside Out

Make sure you brush up on your knowledge of reinforcement learning, especially in the context of mixture of expert models. Be ready to discuss your past experiences and how they relate to the role. Prepare to explain complex concepts in simple terms, as this shows your depth of understanding.

✨Showcase Your Experimental Design Skills

Be prepared to talk about your experience with experimental design, including how you've set up tight baselines and clean ablations. Bring examples of your work that demonstrate your ability to draw clear, data-backed conclusions. This will highlight your analytical skills and attention to detail.

✨Familiarise Yourself with Their Tech Stack

Since the role involves working with Python and PyTorch, make sure you're comfortable discussing your experience with these technologies. If you’ve worked on large ML codebases or have experience with distributed training, be ready to share specific examples of your contributions.

✨Prepare for Collaboration Questions

Canva values collaboration, so think about times when you've worked closely with product, design, or safety teams. Be ready to discuss how you’ve turned research into reliable features and how you handle feedback. This will show that you can thrive in a team-oriented environment.

Senior Research Scientist - Reinforcement Learning, MoEs

black.ai

Apply now

Senior Research Scientist - Reinforcement Learning, MoEs

At a Glance

Senior Research Scientist - Reinforcement Learning, MoEs employer: black.ai

StudySmarter Expert Advice 🤫

✨Tip Number 1

✨Tip Number 2

✨Tip Number 3

✨Tip Number 4

We think you need these skills to ace Senior Research Scientist - Reinforcement Learning, MoEs

Some tips for your application 🫡

How to prepare for a job interview at black.ai

Senior Research Scientist - Reinforcement Learning, MoEs

Land your dream job quicker with Premium