At a Glance
- Tasks: Design and optimise datasets for training multimodal models using audio, text, video, and images.
- Company: Join Cartesia, a pioneering AI company building real-time multimodal intelligence.
- Benefits: Enjoy free meals, comprehensive health insurance, pension plans, and relocation support.
- Why this job: Be part of a cutting-edge team in a collaborative, fast-paced environment focused on innovation.
- Qualifications: Expertise in multimodal data curation, Python programming, and familiarity with tools like OpenCV and Hugging Face.
- Other info: Work in our new London office and collaborate with top experts in AI.
The predicted salary is between 36000 - 60000 £ per year.
About Cartesia
Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.
We\’re pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.
We\’re funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We\’re fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world\’s foremost experts in AI.
The Role
We’re opening our first ever office in Europe, and looking to hire incredible talent in London to advance our mission of building real-time multimodal intelligence. In this role, you\’ll:
• Lead the design, creation, and optimization of datasets for training and evaluating multimodal models across diverse modalities, including audio, text, video, and images.
• Develop strategies for curating, aligning, and augmenting multimodal datasets to address challenges in synchronization, variability, and scalability.
• Design innovative methods for data augmentation, synthetic data generation, and cross-modal sampling to enhance the diversity and robustness of datasets.
• Create datasets tailored for specific multimodal tasks, such as audio-visual speech recognition, text-to-video generation, or cross-modal retrieval, with attention to real-world deployment needs.
• Collaborate closely with researchers and engineers to ensure datasets are optimized for target architectures, training pipelines, and task objectives.
• Build scalable pipelines for multimodal data processing, annotation, and validation to support research and production workflows.
What We’re Looking For
• Expertise in multimodal data curation and processing, with a deep understanding of challenges in combining diverse data types like audio, text, images, and video.
• Proficiency in tools and libraries for handling specific modalities, such as librosa (audio), OpenCV (video), and Hugging Face (text).
• Familiarity with data alignment techniques, including time synchronization for audio and video, embedding alignment for cross-modal learning, and temporal consistency checks.
• Strong understanding of multimodal dataset design principles, including methods for ensuring data diversity, sufficiency, and relevance for targeted applications.
• Programming expertise in Python and experience with frameworks like PyTorch or TensorFlow for building multimodal data pipelines.
• Comfortable with large-scale data processing and distributed systems for multimodal dataset storage, processing, and management.
• A collaborative mindset with the ability to work cross-functionally with researchers, engineers, and product teams to align data strategies with project goals.
Nice-to-Haves
• Experience in creating synthetic multimodal datasets using generative models, simulation environments, or advanced augmentation techniques.
• Background in annotating and aligning multimodal datasets for tasks such as audio-visual speech recognition, video-captioning, or multimodal reasoning.
• Early-stage startup experience or a proven track record of building datasets for cutting-edge research in fast-paced environments.
Our culture
We’re an in-person team based out of San Francisco, Bangalore & London. We love being in the office, hanging out together and learning from each other everyday.
We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.
We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.
Our perks
Lunch, dinner and snacks at the office.
Fully covered medical, dental, and vision insurance for employees.
Pension Plan.
️ Relocation and immigration support.
Your own personal Yoshi.
#J-18808-Ljbffr
Researcher: Multimodal (Data), UK employer: Cartesia
Contact Detail:
Cartesia Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Researcher: Multimodal (Data), UK
✨Tip Number 1
Familiarise yourself with the latest advancements in multimodal AI. Understanding the current trends and challenges in combining audio, text, video, and images will help you stand out during discussions with our team.
✨Tip Number 2
Showcase your experience with specific tools like librosa, OpenCV, and Hugging Face. Being able to discuss how you've used these libraries in past projects can demonstrate your hands-on expertise.
✨Tip Number 3
Prepare to discuss your approach to data augmentation and synthetic data generation. We value innovative thinking, so having examples of how you've tackled these challenges will be beneficial.
✨Tip Number 4
Emphasise your collaborative mindset. Be ready to share experiences where you've worked cross-functionally with researchers and engineers, as teamwork is crucial in our fast-paced environment.
We think you need these skills to ace Researcher: Multimodal (Data), UK
Some tips for your application 🫡
Understand the Role: Before applying, make sure you fully understand the responsibilities and requirements of the Researcher: Multimodal (Data) position. Familiarise yourself with the specific skills needed, such as expertise in multimodal data curation and processing.
Tailor Your CV: Customise your CV to highlight relevant experience and skills that align with the job description. Emphasise your proficiency in tools like librosa, OpenCV, and Hugging Face, as well as your programming expertise in Python.
Craft a Compelling Cover Letter: Write a cover letter that showcases your passion for AI and your understanding of multimodal datasets. Mention any relevant projects or experiences that demonstrate your ability to tackle challenges in data curation and processing.
Showcase Collaborative Experience: In your application, highlight any previous collaborative work with researchers or engineers. This role requires a collaborative mindset, so providing examples of successful teamwork will strengthen your application.
How to prepare for a job interview at Cartesia
✨Showcase Your Multimodal Expertise
Make sure to highlight your experience with multimodal data curation and processing. Be prepared to discuss specific challenges you've faced when combining audio, text, images, and video, and how you overcame them.
✨Familiarity with Tools is Key
Demonstrate your proficiency in relevant tools and libraries such as librosa, OpenCV, and Hugging Face. You might be asked to explain how you've used these tools in past projects, so have examples ready.
✨Understand Dataset Design Principles
Be ready to discuss the principles of multimodal dataset design, including ensuring data diversity and relevance. Think about how you can apply these principles to real-world applications, as this will show your practical understanding.
✨Collaboration is Crucial
Since the role involves working closely with researchers and engineers, emphasise your collaborative mindset. Share examples of how you've successfully worked cross-functionally in previous roles to align data strategies with project goals.