AI Engineer β€” Speech & Voice Intelligence in London

AI Engineer β€” Speech & Voice Intelligence in London

London Full-Time 60000 - 80000 Β£ / year (est.) Home office (partial)
C

At a Glance

  • Tasks: Develop cutting-edge voice AI systems for Arabic speech synthesis and recognition.
  • Company: CNTXT, a pioneering tech company focused on Arabic voice AI.
  • Benefits: Competitive pay, remote work options, and a chance to shape impactful products.
  • Other info: Work at the forefront of an underserved market with excellent growth potential.
  • Why this job: Join a small team making a real difference in Arabic voice technology.
  • Qualifications: Strong machine learning skills and experience with Python and neural models.

The predicted salary is between 60000 - 80000 Β£ per year.

CNTXT is building voice AI infrastructure for the Arabic-speaking world. We work on the hard problems β€” natural speech synthesis, real-time transcription, and conversational voice systems β€” with a focus on Arabic language quality that actually serves the region's speakers.

We're looking for an AI engineer or researcher who is passionate about voice and speech technology. You'll work directly on the models and systems that power our speech products β€” evaluating architectures, running fine-tuning experiments, and shipping improvements to production. This is a hands-on role that sits at the intersection of research and engineering.

What Our Team Works On

  • Speech Synthesis (TTS) - We build and fine-tune Arabic TTS systems based on state-of-the-art generative architectures β€” both autoregressive models that generate speech token by token and non-autoregressive models that produce full utterances in parallel. This includes working with neural vocoders (HiFi-GAN, MelGAN, WaveGlow), audio codecs and tokenizers (EnCodec, DAC, RVQ-based systems), acoustic encoders (HuBERT, wav2vec), and diffusion-based audio decoders. A significant focus is voice cloning and zero-shot speaker adaptation for Arabic voices.
  • Speech Recognition (ASR) - We work with encoder-decoder and CTC-based ASR models (Whisper, Conformer, wav2vec 2.0) to build accurate, low-latency Arabic transcription. This includes streaming inference, domain adaptation, and language model integration for Arabic dialect robustness.
  • Speech-to-Speech - We are building end-to-end voice interaction pipelines that chain ASR, language understanding, and TTS β€” with hard constraints on latency. This involves voice activity detection (VAD), speaker diarization, speech enhancement, and optimizing the full stack for real-time performance.
  • Arabic Language Challenges - Arabic presents unique challenges across the whole stack: diacritization (tashkil) is critical for TTS pronunciation accuracy, dialect variation (MSA, Gulf, Levantine, Egyptian, Maghrebi) affects both synthesis and recognition quality, and training data for many dialects remains scarce. A big part of our work is closing these gaps.

What You'll Work On

  • Benchmark and evaluate TTS and ASR models on Arabic test sets β€” measuring WER, speaker similarity (SIM), naturalness, and dialect coverage across MSA and regional varieties.
  • Fine-tune pretrained TTS models on curated Arabic data β€” including ablations on diacritized vs. undiacritized input, dialect-specific training splits, and voice prompt conditioning.
  • Experiment with audio tokenizer and codec configurations β€” comparing discrete RVQ representations against continuous latent approaches and their effect on Arabic phoneme accuracy.
  • Build and maintain Arabic speech data pipelines β€” audio sourcing, normalization, diacritization, quality filtering, and manifest generation for model training.
  • Optimize models for production serving β€” streaming chunk generation, KV cache tuning, quantization, and batched inference for low-latency Arabic TTS and ASR.
  • Evaluate and adapt speech-to-speech pipelines β€” integrating ASR, LLM, and TTS components with attention to end-to-end latency and Arabic conversational quality.

What We're Looking For

  • Strong foundations in machine learning and deep learning.
  • Hands-on experience training or fine-tuning neural models β€” domain matters less than depth.
  • Comfortable with Python, PyTorch, and the HuggingFace ecosystem.
  • Able to read research papers and translate ideas into experiments independently.
  • Clear communicator who can work across research and engineering.

Nice to Have

  • Native or fluent Arabic speaker β€” a real advantage when evaluating synthesis naturalness and dialect quality.
  • Prior work with speech or audio models (ASR, TTS, speaker verification, codec, VAD, enhancement, or similar).
  • Familiarity with Arabic linguistic structure, diacritization tools, and NLP preprocessing for Arabic.
  • Experience with inference optimization β€” quantization, speculative decoding, CUDA kernels, or serving frameworks (vLLM, TensorRT).
  • Publications or open-source contributions in speech or audio.

What We Offer

  • Work at the frontier of Arabic voice AI β€” a genuinely underserved, high-impact area.
  • Direct influence on product and research direction.
  • Small, focused team β€” your work ships and matters.
  • Competitive compensation and remote flexibility.

AI Engineer β€” Speech & Voice Intelligence in London employer: CNTXT AI

At CNTXT, we pride ourselves on being an exceptional employer, offering a unique opportunity to work at the forefront of Arabic voice AI technology. Our remote-friendly and hybrid work culture fosters collaboration and innovation within a small, dedicated team, ensuring that your contributions directly impact our products and research direction. With competitive compensation and a strong focus on employee growth, CNTXT is committed to supporting your professional development while tackling meaningful challenges in the field of speech and voice intelligence.

C

Contact Details:

CNTXT AI Recruitment Team

StudySmarter Expert Advice🀫

We think this is how you could land AI Engineer β€” Speech & Voice Intelligence in London

✨Tip Number 1

Network like a pro! Reach out to folks in the AI and speech tech space on LinkedIn or at industry events. A friendly chat can open doors that a CV just can't.

✨Tip Number 2

Show off your skills! Create a portfolio showcasing your projects related to voice AI, TTS, or ASR. This gives potential employers a taste of what you can do beyond the written application.

✨Tip Number 3

Prepare for interviews by brushing up on the latest trends in Arabic speech technology. Being knowledgeable about the challenges and innovations in the field will impress your interviewers.

✨Tip Number 4

Don't forget to apply through our website! It’s the best way to ensure your application gets seen by the right people at CNTXT. Plus, it shows you're genuinely interested in joining our team.

We think you need these skills to ace AI Engineer β€” Speech & Voice Intelligence in London

Machine Learning
Deep Learning
Neural Model Training
Python
PyTorch
HuggingFace Ecosystem
Speech Synthesis (TTS)

Some tips for your application 🫑

Show Your Passion:When writing your application, let your enthusiasm for voice and speech technology shine through. We want to see that you're genuinely excited about tackling the challenges in Arabic voice AI!

Tailor Your Experience:Make sure to highlight your hands-on experience with machine learning and deep learning. We’re looking for specific examples of how you've trained or fine-tuned models, so don’t hold back on the details!

Communicate Clearly:Since this role involves collaboration between research and engineering, clarity is key. Use straightforward language to explain your past projects and how they relate to the work we do at CNTXT.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for this exciting opportunity in Arabic voice AI!

How to prepare for a job interview at CNTXT AI

✨Know Your Stuff

Make sure you brush up on the latest advancements in speech and voice technology, especially related to Arabic language processing. Familiarise yourself with key models like Whisper and HiFi-GAN, and be ready to discuss how you've applied similar technologies in your past work.

✨Showcase Your Projects

Prepare to talk about specific projects where you've trained or fine-tuned neural models. Highlight any challenges you faced, particularly with dialect variations or diacritization, and how you overcame them. Real-world examples will make you stand out!

✨Communicate Clearly

Since this role sits at the intersection of research and engineering, practice explaining complex concepts in simple terms. Be ready to demonstrate how you can bridge the gap between technical details and practical applications, especially when discussing your experience with Python and PyTorch.

✨Ask Insightful Questions

Prepare thoughtful questions about CNTXT's approach to tackling unique Arabic language challenges. This shows your genuine interest in their work and helps you understand how you can contribute effectively to their team.