At a Glance
- Tasks: Join a dynamic team to develop NLP solutions for drug safety and toxicology.
- Company: EMBL-EBI, a leading research centre in biological data.
- Benefits: Competitive salary, flexible working, generous leave, and family-friendly perks.
- Why this job: Make a real impact in drug discovery while working with cutting-edge technology.
- Qualifications: PhD or equivalent experience in computational linguistics or related fields.
- Other info: Hybrid working model with excellent career growth opportunities.
The predicted salary is between 3303 - 3695 ÂŁ per month.
Open at EMBL-EBI, Cambridge, United Kingdom. Contract length: 3 years (project based). Salary: Grade 5–6 (monthly £3,303–£3,695 after tax, excluding pension & insurances). Closing date: 11/01/2026.
About the team
Safety and toxicology concerns remain one of the most persistent challenges in drug discovery. This role joins a multi‑disciplinary team to develop a comprehensive open‑source side‑effect resource for the scientific and pharmaceutical community, and to provide structured and standardised training sets for AI/ML applications that improve early identification of safety liabilities.
Role overview
The position is embedded within the Chemical Biology Services team at EMBL‑EBI and the Open Targets Safety 2.0 project. You will work closely with safety scientists from Open Targets pharmaceutical partners (MSD, Genentech, GSK, Pfizer, Sanofi), ensuring delivery of workpackages and seamless integration of pipelines into ChEMBL and the Open Targets Platform.
Key responsibilities
- Develop machine learning pipelines for extracting drug side effects from drug labels, clinical trials, publications and other documents.
- Investigate modern NLP methodologies and propose ideas for the implementation of data extraction methods and pipelines.
- Apply language models to extract and map drug‑related information from unstructured text, e.g. from the scientific literature, ClinicalTrials.gov.
- Implement and/or fine‑tune different NLP models, e.g. NER models, transformer models, LLMs.
- Integrate project workflows with existing infrastructures in the EBI Chemical Biology Services and Open Targets teams.
- Prepare and evaluate benchmark datasets from the open domain as training sets for NLP models.
- Work with domain experts to develop new gold standards for NLP tasks where needed.
- Assist with and/or perform data curation to prepare clean and reliable training sets.
- Apply and/or adapt existing methods for mapping extracted entities to biomedical ontologies, e.g. drugs, side effects/phenotypes, and diseases.
- Work closely with Safety 2.0 project group members bridging the ChEMBL and Open Targets teams.
- Work closely with the Open Targets Core team to ensure seamless integration of data and workflows into the Open Targets Platform and long‑term sustainability.
- Collaborate with the Open Targets Partners to assess, prioritise, validate and refine the developed methods.
- Disseminate the outcomes of the project to the scientific community and stakeholders through presentations and publications.
Required qualifications
- PhD, Masters or equivalent experience in computational linguistics, computer science, bioinformatics, or cheminformatics.
- Experience with language models e.g. transformer models, LLMs, AI agents for information extraction.
- Experience with document and text preprocessing, cleaning and transformation techniques including mapping to ontologies.
- Experience with data structures, data models and databases.
- Knowledge of cheminformatics resources and/or bioinformatics databases.
- Knowledge of data analysis and machine learning.
- Proficiency in Python.
- Knowledge of data frameworks e.g. pySpark, pandas, Polar.
- Excellent attention to detail.
- Strong communication skills, both presentations and verbal.
- Experience working in a team‑oriented environment and collaborating.
- Able to work independently, to manage time and work to deadlines.
Preferred experience
- Experience with the application of NLP methods to cheminformatics and/or biomedical domains.
- Experience with version control.
- Experience in safety/toxicology in industry or research.
Other helpful information
- Hybrid Working: At EMBL‑EBI we embrace a hybrid approach – team members are typically on site at least three days a week, with a desk always available.
- Interviews: Introductory meetings will be held remotely starting in February 2026.
Why join us: EMBL‑EBI, part of the European Molecular Biology Laboratory, is a world‑leading research centre for large biological data. Enjoy a collaborative, inclusive culture, flexible working and a wide range of on‑site and remote facilities.
Benefits
- Financial incentives: monthly family, child and non‑resident allowances, annual salary review, pension scheme, death benefit, long‑term care, accident‑at‑work and unemployment insurances.
- Flexible working arrangements – including hybrid patterns.
- Private medical insurance for you and your immediate family (including prescriptions, dental and optical cover).
- Generous time off: 30 days annual leave per year plus public holidays.
- Relocation package including installation grant (if required).
- Campus life: free shuttle bus, on‑site library, subsidised gym and cafeteria, casual dress code, sports and social club activities (on campus or remotely).
- Family benefits: on‑site nursery, 10 days child sick leave, generous parental leave, holiday clubs on campus and monthly family and child allowances.
- Benefits for non‑UK residents: visa exemption, education grant for private schooling, financial support to travel back home every second year and a monthly non‑resident allowance.
Additional information
- International applicants: we recruit internationally and successful candidates are offered visa exemptions.
- EMBL is a signatory of DORA – find out how we apply DORA principles to our recruitment and performance assessment processes.
- Diversity and inclusion: we strongly believe that inclusive and diverse teams benefit from higher levels of innovation and creative thought. We encourage applications from women, LGBTQ+ individuals and people from all nationalities.
How to apply: submit a cover letter and CV through our online system. Applications will close at 23:59 CET on the date shown above (11/01/2026). We aim to respond within two weeks after the closing date.
Closing date 11/01/2026
NLP Data Scientist/Scientific Data Engineer employer: European Bioinformatics Institute | EMBL-EBI
Contact Detail:
European Bioinformatics Institute | EMBL-EBI Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land NLP Data Scientist/Scientific Data Engineer
✨Tip Number 1
Network like a pro! Reach out to people in the industry, especially those at EMBL-EBI or similar organisations. A friendly chat can open doors and give you insights that a job description just can't.
✨Tip Number 2
Prepare for interviews by brushing up on your NLP knowledge and practical skills. Be ready to discuss your experience with language models and data extraction methods, as these are key for the role.
✨Tip Number 3
Showcase your projects! If you've worked on relevant NLP or data science projects, make sure to highlight them during interviews. Real-world examples can really impress hiring managers.
✨Tip Number 4
Don't forget to apply through our website! It’s the best way to ensure your application gets seen. Plus, it shows you're serious about joining our team at EMBL-EBI.
We think you need these skills to ace NLP Data Scientist/Scientific Data Engineer
Some tips for your application 🫡
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Make sure to highlight your relevant experience and how it aligns with the role. We want to see your passion for NLP and how you can contribute to our team at EMBL-EBI.
Tailor Your CV: Don’t just send a generic CV! Tailor it to showcase your skills in machine learning, NLP methodologies, and any relevant projects you've worked on. We love seeing how your background fits with what we do!
Showcase Your Projects: If you've worked on any cool projects related to NLP or data engineering, make sure to mention them! We’re keen to see practical examples of your work and how you’ve tackled challenges in the past.
Apply Through Our Website: Remember to submit your application through our online system. It’s the easiest way for us to keep track of your application and ensures you don’t miss out on any important updates from us!
How to prepare for a job interview at European Bioinformatics Institute | EMBL-EBI
✨Know Your NLP Stuff
Make sure you brush up on the latest NLP methodologies and language models, especially transformers and LLMs. Be ready to discuss how you've applied these in your previous work or projects, as this will show your expertise and passion for the field.
✨Showcase Your Collaboration Skills
Since this role involves working closely with safety scientists and other teams, be prepared to share examples of how you've successfully collaborated in the past. Highlight any experiences where you bridged gaps between different teams or disciplines.
✨Prepare for Technical Questions
Expect some technical questions related to data structures, document preprocessing, and machine learning pipelines. Brush up on your Python skills and be ready to explain your thought process when tackling complex problems.
✨Demonstrate Attention to Detail
Given the importance of clean and reliable training sets, be ready to discuss how you ensure accuracy in your work. Share specific examples of how you've maintained high standards in data curation or project delivery.