At a Glance
- Tasks: Join a team of experts to create cutting-edge synthetic voices and enhance AI communication.
- Company: Synthesia, the leading AI video platform, valued at $4 billion.
- Benefits: Competitive salary, remote work options, and opportunities for professional growth.
- Other info: Dynamic R&D environment with a focus on innovation and collaboration.
- Why this job: Make a global impact in the exciting field of generative AI and voice synthesis.
- Qualifications: Expertise in ML, LLMs, and speech generation required.
The predicted salary is between 70000 - 90000 £ per year.
Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London, with offices and teams across Europe and the US. As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development, helping people work better and stay at the center of successful organizations.
Following our recent Series E funding round, where we raised $200 million, our valuation stands at $4 billion. Our total funding exceeds $530 million from premier investors including Accel, NVentures (Nvidia's VC arm), Kleiner Perkins, GV, and Evantic Capital, alongside the founders and operators of Stripe, Datadog, Miro, and Webflow.
As a Research Engineer you will join a team of 40+ Researchers and Engineers within the R&D Department working on cutting-edge challenges in the Generative AI space, with a focus on creating high-quality, expressive and real-time synthetic voices. Within the team you’ll have the opportunity to work on the applied side of our research efforts and directly impact our solutions that are used worldwide by over 60,000 businesses.
If you are an expert in ML, LLMs, speech generation, conversational models, this is your chance to make a global impact. You will join our Audio Post-Training Team, which works on generative speech and voice synthesis, ensuring our in-house voice models reach production-level quality, speed, and robustness.
Typical projects include:
- Develop and evaluate streaming and speech-to-speech systems, enabling low-latency, interactive voice synthesis.
- Adapt models for new conditioning inputs (emotion, speed, prosody, speaker control, etc.).
- Implement post-training optimization techniques (quantization, pruning, distillation) to improve efficiency and latency in real-time speech generation.
- Integrate and test novel architectures, such as neural codecs, diffusion, or flow-matching models, to enhance realism and responsiveness.
- Contribute to defining new evaluation metrics for conversational speech, including latency-aware and online MOS prediction systems.
- Stay updated with the latest research in audio diffusion, autoregressive models, neural codecs, and multimodal LLMs.
- Apply DPO (Direct Preference Optimization) and distillation to fine-tune large-scale speech models.
What we're looking for:
- Strong understanding of generative modeling, ideally applied to sequential or multimodal data.
- Hands-on experience with large language models (LLMs) or similar transformer-based architectures.
- High proficiency in PyTorch, including experience with distributed training and model optimization.
- Solid grasp of time-series modeling and tokenization, preferably in the context of audio or speech.
- Demonstrated ability to prototype quickly, test hypotheses, and iterate efficiently.
- Proven experience in training deep learning models end-to-end, from data preparation to evaluation.
- Strong general software engineering skills, enabling contributions to a large, shared research infrastructure.
Nice to have experience:
- Experience with real-time or streaming architectures is a big plus.
- Familiarity with state-of-the-art architectures in audio and speech generation (e.g., diffusion models, neural codecs, flow-matching models, autoregressive decoders).
- Experience with speech-to-speech or text-to-speech (TTS) systems.
- Evidence of original research contributions, such as publications or open-source work in top-tier venues (e.g., ICASSP, Interspeech, NeurIPS, ICML).
Senior Research Engineer - Voice employer: Synthesia
At Synthesia, we pride ourselves on being at the forefront of AI innovation, offering a dynamic work environment in London that fosters creativity and collaboration. Our commitment to employee growth is evident through our investment in cutting-edge projects and access to premier resources, ensuring that you can make a meaningful impact while developing your skills alongside industry leaders. Join us and be part of a culture that values diversity, encourages experimentation, and rewards excellence in the rapidly evolving field of generative AI.
StudySmarter Expert Advice🤫
We think this is how you could land Senior Research Engineer - Voice
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, especially those at Synthesia. A friendly chat can open doors and give you insights that a job description just can't.
✨Tip Number 2
Show off your skills! If you've got projects or research that align with what Synthesia is doing, make sure to highlight them in conversations. Real-world examples can really set you apart.
✨Tip Number 3
Prepare for the interview by diving deep into their tech stack. Brush up on PyTorch and generative models, and be ready to discuss how your experience fits into their vision for AI voice synthesis.
✨Tip Number 4
Don't forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you're genuinely interested in joining the team.
We think you need these skills to ace Senior Research Engineer - Voice
Some tips for your application 🫡
Tailor Your CV:Make sure your CV is tailored to the role of Senior Research Engineer - Voice. Highlight your experience with generative modelling, LLMs, and any relevant projects that showcase your skills in speech generation and conversational models.
Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you're passionate about AI and voice synthesis. Mention specific projects or experiences that align with what Synthesia is doing, and show us how you can contribute to the team.
Showcase Your Projects:If you've worked on any relevant projects, whether personal or professional, make sure to include them. We love seeing practical applications of your skills, especially if they involve real-time speech generation or innovative architectures.
Apply Through Our Website:We encourage you to apply through our website for the best chance of getting noticed. It’s straightforward and ensures your application goes directly to us, so we can review it promptly!
How to prepare for a job interview at Synthesia
✨Know Your Stuff
Make sure you brush up on generative modelling and large language models. Be ready to discuss your hands-on experience with PyTorch and any projects you've worked on that relate to speech generation or conversational models.
✨Showcase Your Projects
Prepare to talk about specific projects where you've implemented real-time speech synthesis or optimised models. Highlight any challenges you faced and how you overcame them, as this will show your problem-solving skills.
✨Stay Current
Familiarise yourself with the latest research in audio diffusion and neural codecs. Being able to reference recent advancements or papers during your interview can demonstrate your passion and commitment to the field.
✨Ask Insightful Questions
Prepare thoughtful questions about Synthesia's current projects or future directions in AI voice technology. This shows your genuine interest in the role and helps you assess if the company is the right fit for you.