At a Glance
- Tasks: Drive research in multimodal agents and develop innovative AI systems for design.
- Company: Join Canva, a leading design platform with a vibrant culture.
- Benefits: Equity packages, flexible leave, and a wellbeing allowance to support your lifestyle.
- Why this job: Make a real impact in AI while collaborating with passionate teams.
- Qualifications: Experience in reinforcement learning and agentic systems is essential.
- Other info: Dynamic work environment with opportunities for personal and professional growth.
The predicted salary is between 36000 - 60000 ÂŁ per year.
Join the team redefining how the world experiences design.
Where and How You Can Work
The buzzing Canva London campus features several buildings around beautiful leafy Hoxton Square in Shoreditch. While our global headquarters is in Sydney, Australia, London is our HQ for Europe, with all kinds of teams based here, plus event spaces to gather our team and communities. Youâll experience a warm welcome from our Vibe team at front of house, amazing home cooked food from our Head Chef and a variety of workspaces to hang out with your team mates or get solo work done. That said, we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals and so you have choice in where and how you work.
At Canva, our mission is to empower the world to design. Weâre building AI that feels magical and lands real impact for millions of people â helping anyone create with confidence. Weâre looking for a senior research scientist who lives and breathes reinforcement learning and agentic systems to push the frontier of reasoning, tool use, and reliability â and ship it to users.
About The Team
We explore multimodal agentic architectures, build scalable training and evaluation loops, and partner closely with product and platform teams to turn breakthroughs into delightful product features. We are a cuttingâedge postâtraining team, developing new multimodal agentic systems. We work on all topics of multimodal modelling, postâtraining and design agents, building scalable training and evaluation loops, and partnering closely with product and platform teams to turn breakthroughs into delightful product features. We seek a person with experience in postâtraining and reinforcement learning (RL).
About The Role
Youâll drive research directions and play a leading role in handsâon work across the agent stackâfrom reward design and policy optimization to planning, memory, and tool orchestration, dataset construction, to postâtraining, and the development of novel postâtraining approaches. Youâll design tight experiments, iterate quickly, and land trustworthy conclusions. Most importantly, youâll help convert research into reliable, safe, and highâquality product experiences.
What Youâll Be Doing In This Role
- Develop agent systems (planning, multimodal tool use, retrieval, novel training approaches, modeling ablations) for real tasks in design, vision, and language.
- Scale postâtraining and RL across distributed systems (PyTorch) with efficient data loaders, tracing/telemetry, stable training of mixtureâofâexperts (MoE) architectures, and reproducible pipelines; profile, debug, and optimize.
- Contribute to the research agenda for RL/agentic systems aligned with Canvaâs product goals; identify highâleverage bets and retire dead ends quickly.
- Build reward models and learning loops: RLHF/RLAIF, preference modeling, DPO/IPOâstyle objectives, offline/online RL, curriculum learning, and credit assignment for multiâstep reasoning.
- Develop simulation and sandbox tasks that surface failure modes (planning errors, toolâuse brittleness, hallucination, unsafe actions) and turn them into measurable targets.
- Help align on rigorous evaluation for agents (task success, reliability, latency, safety, regressions). Stand up offline suites and online A/B tests; favor simple, controlled experiments that generalize.
- Collaborate and ship: work shoulderâtoâshoulder with product, design, safety, and platform to land research as reliable featuresâthen iterate.
- Share and elevate: mentor teammates, present findings internally, and contribute back to the community when it helps the field and our users.
Youâre Likely a Match If You Have
- Depth in implementing and postâtraining LLMs/VLMs/Diffusion models, with a track record of shipped research or publications in agents/RL.
- Experience modifying, and adapting openâsource models.
- Strong experience with experimental design: tight baselines, clean ablations, reproducibility, and clear, dataâbacked conclusions.
- Fluency in Python and PyTorch; youâre comfortable in large ML codebases and can profile, debug, and optimize training and inference.
- Practical experience building agent loops (planning, tool invocation, retrieval, memory) and evaluating multiâstep reasoning quality.
- Handsâon experience with policy optimization, reward modeling, and preference learning (e.g., RLHF/RLAIF, DPO/IPO, actorâcritic/PPO, offline RL).
- Experience with largeâscale training (distributed training, experiment tracking, evaluation harnesses) and cloud multimodal tooling.
- Experience with RL for MoE architectures.
Nice to Have
- Experience with video and audio modelling.
- Experience with multiâagent settings.
- Strength in alignment and safety evaluations, including redâteaming and risk mitigation for toolâusing agents.
- Contributions to openâsource, benchmarks, or shared evaluation suites for agents.
Whatâs in it for you?
Achieving our crazy big goals motivates us to work hard â and we do â but youâll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too. We also offer a range of benefits to set you up for every success in and outside of work.
- Equity packages â we want our success to be yours too.
- Inclusive parental leave policy that supports all parents & carers.
- An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more.
- Flexible leave options that empower you to be a force for good, take time to recharge and support you personally.
Other Stuff To Know
We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture. When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process.
We celebrate all types of skills and backgrounds at Canva so even if you donât feel like your skills quite match whatâs listed above â we still want to hear from you!
Please note that interviews are conducted virtually.
Senior Research Scientist - Multimodal Agents in London employer: Canva
Contact Detail:
Canva Recruiting Team
StudySmarter Expert Advice đ¤Ť
We think this is how you could land Senior Research Scientist - Multimodal Agents in London
â¨Tip Number 1
Network like a pro! Reach out to current or former employees at Canva on LinkedIn. A friendly chat can give you insider info and maybe even a referral, which can really boost your chances.
â¨Tip Number 2
Prepare for the interview by diving deep into Canva's mission and values. Show us how your experience in reinforcement learning and agentic systems aligns with what we do. Tailor your examples to highlight your relevant skills!
â¨Tip Number 3
Practice makes perfect! Set up mock interviews with friends or use online platforms. Focus on articulating your thought process when tackling complex problems, especially around multimodal agents and RL.
â¨Tip Number 4
Donât forget to apply through our website! Itâs the best way to ensure your application gets seen. Plus, it shows youâre genuinely interested in joining our team at Canva.
We think you need these skills to ace Senior Research Scientist - Multimodal Agents in London
Some tips for your application đŤĄ
Tailor Your Application: Make sure to customise your CV and cover letter for the Senior Research Scientist role. Highlight your experience with reinforcement learning and agentic systems, as this is what weâre really looking for!
Showcase Your Projects: Include specific examples of your past work that relate to multimodal modelling and post-training. We love seeing how youâve tackled challenges and turned research into real-world applications.
Be Clear and Concise: When writing your application, keep it straightforward. Use clear language to explain your skills and experiences, so we can easily see how you fit into our team at Canva.
Apply Through Our Website: Donât forget to submit your application through our official website! Itâs the best way for us to receive your details and get you in the loop for this exciting opportunity.
How to prepare for a job interview at Canva
â¨Know Your Stuff
Make sure you brush up on reinforcement learning and agentic systems. Familiarise yourself with the latest research and developments in these areas, as well as how they relate to Canva's mission. Being able to discuss your past experiences and how they align with the role will show that you're genuinely interested.
â¨Prepare for Technical Questions
Expect to dive deep into technical discussions about post-training, policy optimisation, and reward modelling. Practise explaining complex concepts clearly and concisely, as you'll need to demonstrate your expertise in Python and PyTorch. Consider doing mock interviews with peers to sharpen your responses.
â¨Show Your Collaborative Spirit
Canva values teamwork, so be ready to share examples of how you've successfully collaborated with product, design, and safety teams in the past. Highlight any experiences where you turned research into practical applications, as this will resonate well with their focus on delivering reliable features.
â¨Ask Insightful Questions
Prepare thoughtful questions that show your interest in Canva's projects and culture. Inquire about their approach to scaling multimodal agentic systems or how they evaluate the success of their agents. This not only demonstrates your enthusiasm but also helps you gauge if the company is the right fit for you.