Senior Voice AI Research Engineer — Real-Time Speech Synthesis

Job Board

Companies

Synthesia

Senior Voice AI Research Engineer — Real-Time Speech Synthesis

Full-Time 70000 - 90000 £ / year (est.) Home office (partial)

Apply Now

At a Glance

Tasks: Join a team to create cutting-edge real-time synthetic voices and enhance AI communication.
Company: Synthesia, a leading AI video platform valued at $4 billion.
Benefits: Competitive salary, remote work options, and opportunities for professional growth.
Other info: Dynamic R&D environment with a focus on innovation and collaboration.
Why this job: Make a global impact in the exciting field of generative AI and voice synthesis.
Qualifications: Expertise in ML, LLMs, and speech generation required.

The predicted salary is between 70000 - 90000 £ per year.

Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London, with offices and teams across Europe and the US. As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development, helping people work better and stay at the center of successful organizations.

Following our recent Series E funding round, where we raised $200 million, our valuation stands at $4 billion. Our total funding exceeds $530 million from premier investors including Accel, NVentures (Nvidia's VC arm), Kleiner Perkins, GV, and Evantic Capital, alongside the founders and operators of Stripe, Datadog, Miro, and Webflow.

As a Research Engineer you will join a team of 40+ Researchers and Engineers within the R&D Department working on cutting-edge challenges in the Generative AI space, with a focus on creating high-quality, expressive and real-time synthetic voices. Within the team you’ll have the opportunity to work on the applied side of our research efforts and directly impact our solutions that are used worldwide by over 60,000 businesses.

If you are an expert in ML, LLMs, speech generation, conversational models, this is your chance to make a global impact. You will join our Audio Post-Training Team, which works on generative speech and voice synthesis, ensuring our in-house voice models reach production-level quality, speed, and robustness.

Typical projects include:

Develop and evaluate streaming and speech-to-speech systems, enabling low-latency, interactive voice synthesis.
Adapt models for new conditioning inputs (emotion, speed, prosody, speaker control, etc.).
Implement post-training optimization techniques (quantization, pruning, distillation) to improve efficiency and latency in real-time speech generation.
Integrate and test novel architectures, such as neural codecs, diffusion, or flow-matching models, to enhance realism and responsiveness.
Contribute to defining new evaluation metrics for conversational speech, including latency-aware and online MOS prediction systems.
Stay updated with the latest research in audio diffusion, autoregressive models, neural codecs, and multimodal LLMs.
Apply DPO (Direct Preference Optimization) and distillation to fine-tune large-scale speech models.

What we're looking for:

Strong understanding of generative modeling, ideally applied to sequential or multimodal data.
Hands-on experience with large language models (LLMs) or similar transformer-based architectures.
High proficiency in PyTorch, including experience with distributed training and model optimization.
Solid grasp of time-series modeling and tokenization, preferably in the context of audio or speech.
Demonstrated ability to prototype quickly, test hypotheses, and iterate efficiently.
Proven experience in training deep learning models end-to-end, from data preparation to evaluation.
Strong general software engineering skills, enabling contributions to a large, shared research infrastructure.

Nice to have experience:

Experience with real-time or streaming architectures is a big plus.
Familiarity with state-of-the-art architectures in audio and speech generation (e.g., diffusion models, neural codecs, flow-matching models, autoregressive decoders).
Experience with speech-to-speech or text-to-speech (TTS) systems.
Evidence of original research contributions, such as publications or open-source work in top-tier venues (e.g., ICASSP, Interspeech, NeurIPS, ICML).

Senior Voice AI Research Engineer — Real-Time Speech Synthesis employer: Synthesia

At Synthesia, we pride ourselves on being at the forefront of AI innovation, offering a dynamic work environment that fosters creativity and collaboration. Our London headquarters is not just a place to work; it's a hub for growth, where employees are encouraged to push boundaries and develop their skills in cutting-edge technologies. With competitive benefits, a strong emphasis on employee well-being, and the opportunity to make a significant impact in the rapidly evolving field of AI, Synthesia is an exceptional employer for those looking to shape the future of communication.

Contact Details:

Synthesia Recruitment Team

View Synthesia profile

StudySmarter Expert Advice🤫

We think this is how you could land Senior Voice AI Research Engineer — Real-Time Speech Synthesis

✨Tip Number 1

Network like a pro! Reach out to folks in the AI and speech synthesis space on LinkedIn or at industry events. A friendly chat can open doors that a CV just can't.

✨Tip Number 2

Show off your skills! Create a portfolio showcasing your projects, especially those related to generative AI or speech models. This gives potential employers a taste of what you can do.

✨Tip Number 3

Prepare for interviews by brushing up on the latest trends in voice AI and real-time speech synthesis. Being knowledgeable about current research can really impress your interviewers.

✨Tip Number 4

Don't forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are proactive!

We think you need these skills to ace Senior Voice AI Research Engineer — Real-Time Speech Synthesis

Generative Modeling

Large Language Models (LLMs)

Transformer-based Architectures

PyTorch

Distributed Training

Model Optimization

Time-Series Modeling

Tokenization

Deep Learning Model Training

Software Engineering

Real-Time Architectures

Audio and Speech Generation

Speech-to-Speech Systems

Text-to-Speech (TTS) Systems

Research Contributions

Some tips for your application 🫡

Tailor Your CV:Make sure your CV is tailored to the role of Senior Voice AI Research Engineer. Highlight your experience with generative modelling, LLMs, and any relevant projects you've worked on. We want to see how your skills align with what we're looking for!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you're passionate about real-time speech synthesis and how you can contribute to our team. Be sure to mention specific projects or experiences that relate to the job description.

Showcase Your Projects:If you've got any personal projects or research that demonstrate your expertise in speech generation or AI, make sure to include them. We love seeing practical applications of your skills, so don't hold back!

Apply Through Our Website:We encourage you to apply through our website for the best chance of getting noticed. It helps us keep track of applications and ensures you’re considered for the role. Plus, it’s super easy to do!

How to prepare for a job interview at Synthesia

✨Know Your Stuff

Make sure you brush up on the latest trends in generative AI, especially around speech synthesis and large language models. Familiarise yourself with the specific technologies mentioned in the job description, like PyTorch and real-time architectures, so you can speak confidently about your experience.

✨Showcase Your Projects

Prepare to discuss your previous projects that align with the role. Highlight any hands-on experience you've had with deep learning models, especially in audio or speech contexts. If you have publications or open-source contributions, be ready to share those as well!

✨Ask Smart Questions

Interviews are a two-way street! Prepare insightful questions about Synthesia's current projects or future directions in voice AI. This shows your genuine interest in the company and helps you gauge if it’s the right fit for you.

✨Practice Problem-Solving

Expect technical questions or case studies during the interview. Practice explaining your thought process when tackling complex problems, especially those related to model optimisation or real-time systems. This will demonstrate your analytical skills and ability to think on your feet.

Senior Voice AI Research Engineer — Real-Time Speech Synthesis

Synthesia

Apply Now

Senior Voice AI Research Engineer — Real-Time Speech Synthesis

At a Glance

Senior Voice AI Research Engineer — Real-Time Speech Synthesis employer: Synthesia

StudySmarter Expert Advice🤫

We think you need these skills to ace Senior Voice AI Research Engineer — Real-Time Speech Synthesis

Some tips for your application 🫡

How to prepare for a job interview at Synthesia

Company

Product

Help