At a Glance
- Tasks: Design and build scalable data pipelines for advanced AI models.
- Company: Join a leading AI company dedicated to transforming data into intelligence.
- Benefits: Enjoy remote flexibility, generous vacation, health benefits, and personal enrichment perks.
- Why this job: Make a real impact in AI by optimising data for cutting-edge language models.
- Qualifications: Strong Python skills and experience with data processing frameworks required.
- Other info: Collaborate with top talent in a diverse and inclusive environment.
The predicted salary is between 36000 - 60000 ÂŁ per year.
Who we are
Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products. Join us on our mission and shape the future!
Why this role
As a Data Engineer specializing in pretraining data, you will play a pivotal role in developing the data pipeline that underpins Cohere’s advanced language models. Your responsibilities will encompass the end‑to‑end management of training data, including ingestion, cleaning, filtering, and optimization, as well as data modeling to ensure datasets are structured and formatted for optimal model performance. You will work with diverse data sources, such as web data, code data, and multilingual corpora, to ensure their quality, diversity, and reliability. By combining research and engineering, you will bridge the gap between raw data and cutting‑edge AI models, directly contributing to improvements in critical training metrics like throughput and accelerator utilization. Your work will be essential to Cohere’s mission of delivering efficient and reliable language understanding and generation capabilities, driving innovation in natural language processing. If you are passionate about transforming data into the foundation of AI systems, this role offers a unique opportunity to make a meaningful impact.
We have offices in London, Paris, Toronto, San Francisco and New York but also embrace being remote‑friendly! There are no restrictions on where you can be located for this role between EST and EU.
Responsibilities
- Design and build scalable data pipelines to ingest, parse, filter, and optimize diverse web datasets.
- Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance.
- Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency.
- Research and implement innovative data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing.
- Collaborate with cross‑functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting‑edge language models.
Qualifications
- Strong software engineering skills, with proficiency in Python and experience building data pipelines.
- Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools.
- Experience working with large‑scale web datasets like CommonCrawl.
- A passion for bridging research and engineering to solve complex data‑related challenges in AI model training.
Bonus: paper at top‑tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).
We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.
Perks
- An open and inclusive culture and work environment
- Work closely with a team on the cutting edge of AI research
- Weekly lunch stipend, in‑office lunches & snacks
- Full health and dental benefits, including a separate budget to take care of your mental health
- 100% Parental Leave top‑up for up to 6 months
- Personal enrichment benefits towards arts and culture, fitness and well‑being, quality time, and workspace improvement
- Remote‑flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co‑working stipend
- 6 weeks of vacation (30 working days!)
Member of Technical Staff, Data Engineering employer: Cohere Inc.
Contact Detail:
Cohere Inc. Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Member of Technical Staff, Data Engineering
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with people on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your data engineering projects. This gives potential employers a taste of what you can do and sets you apart from the crowd.
✨Tip Number 3
Prepare for interviews by brushing up on common data engineering questions and challenges. Practice coding problems and be ready to discuss your past projects in detail. Confidence is key!
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who are genuinely interested in joining our mission.
We think you need these skills to ace Member of Technical Staff, Data Engineering
Some tips for your application 🫡
Show Your Passion for Data: When writing your application, let us see your enthusiasm for data engineering! Share specific examples of how you've transformed data into valuable insights or improved processes. We love candidates who are genuinely excited about bridging research and engineering.
Tailor Your Experience: Make sure to highlight your relevant experience with data pipelines and the tools you’ve used, like Python or Apache Spark. We want to know how your skills align with our mission, so don’t hold back on showcasing your achievements in this area!
Be Clear and Concise: Keep your application straightforward and to the point. Use clear language to describe your past projects and contributions. We appreciate well-structured applications that make it easy for us to see your qualifications at a glance.
Apply Through Our Website: We encourage you to apply directly through our website! It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on joining our team at Cohere!
How to prepare for a job interview at Cohere Inc.
✨Know Your Data Inside Out
Make sure you’re well-versed in the types of data you'll be working with, especially large-scale web datasets like CommonCrawl. Familiarise yourself with data ingestion, cleaning, and filtering processes, as these will be crucial in your role.
✨Showcase Your Engineering Skills
Be prepared to discuss your experience with Python and any data processing frameworks like Apache Spark or Pandas. Bring examples of past projects where you built data pipelines or optimised data for model performance to demonstrate your technical prowess.
✨Emphasise Collaboration
Cohere values teamwork, so highlight your experience working with cross-functional teams. Share specific instances where you collaborated with researchers or engineers to solve complex data challenges, showcasing your ability to bridge research and engineering.
✨Stay Updated on AI Trends
Familiarise yourself with the latest advancements in natural language processing and AI. Being able to discuss current trends or innovative data curation methods will show your passion for the field and your commitment to contributing to Cohere’s mission.