Junior/ Med Computational Biologist | ML Engineer, BioDiscovery Platform in Cambridge

Job Board

Companies

SEQUENTIAL

Junior/ Med Computational Biologist | ML Engineer, BioDiscovery Platform

Junior/ Med Computational Biologist | ML Engineer, BioDiscovery Platform in Cambridge

Cambridge Full-Time No working from home possible

Apply Now

Sequential is building a next-generationAI-drivendiscoveryplatformto identify and designnovelfunctionalactives, including peptides and complex ingredient systems. The platform integrateslarge-scalebiologicaldatasets(>50,000 samples and measurements)spanning multi-omics data, microbiome sequencing, clinical and real-world outcomes. Our goal is to translate biological signals intoactionable compound discovery and optimisation, powering a pipelineacross:

We are currently prioritising bringing compound-mixture product to market while the longer-horizon peptide platform matures and broadening the range of claims we can support.

Reliable, well-characterised biological data is the foundation of everything the platform predicts and that is where this role sits. We are looking for a junior-to-mid-level Computational Biologist to own the curation of the platform’s biological data, with an initial focus on building a comprehensive, skin-specific cell-line library. This is not pure data wrangling: you will make biologically informed decisions about what data to include, how to structure it, and how to prioritise it against clinical and commercial objectives.

The Data You Will Work With

Compound, mechanism-of-action, and cell-line performance data (e.g., ChEMBL and comparable public or commercial sources)
Cell-line gene-expression and phenotypic data spanning the skin tissue layers: epidermis, dermis, immune, and vascular
Compound property data relevant to screening, such as solubility and skin penetration / diffusion
Bioactivity and toxicity readouts used to characterise compound effects (e.g., IC50, LC50) and dose-response behaviour

Key Responsibilities

Build and curate a skin-specific cell-line library
- Identify, source, and curate cell lines representing the epidermis, dermis, immune, and vascular compartments of skin
- Define and apply quality, provenance, and metadata standards (passage number, authentication, expression profiles, contamination status)
- Structure curated data for ingestion by the Target ID and compound-effect prediction models
Prioritise data against clinical and commercial objectives
- Prioritise cell lines and indications by objective
- Factor practical constraints such as compound solubility and skin-layer penetration into curation and screening decisions
- Sequence curation work to support near-term claim coverage and product milestones
Support Target Identification and compound-effect prediction
- Curate gene-to-compound mappings and mechanism-of-action data that feed the biological discovery engine
- Support analysis of how gene expression in specific cell lines influences compound effects
- Provide curated inputs to mixture-ratio estimation (e.g., effectiveness weighted by LC50) and downstream statistical modelling
Establish data quality, documentation, and reproducibility
- Implement QC checks, documented schemas, and reproducible curation pipelines
- Maintain clear provenance and versioning so datasets are auditable and trustworthy
- Track work and deliverables through the team’s project system (e.g., Jira)
Collaborate cross-functionally
- Work with ML engineers, compound scientists, and biology/formulation colleagues to keep data fit for purpose
- Translate biological context for technical teammates and flag gaps or risks in the data
- Help transition the current prototype into a robust in vitro screening tool

What we’re looking for

Degree (BSc/MSc) or equivalent experience in computational biology, bioinformatics, or a related quantitative life-science field
Hands‑on experience curating biological datasets — sourcing, cleaning, structuring, and quality‑controlling real‑world data
Working knowledge of biological or chemical databases (e.g., ChEMBL, GEO, Ensembl, cell‑line repositories) and their limitations
Understanding of skin or human tissue biology — cell types, tissue layers, and how tissue context shapes gene expression and compound response
Proficiency in Python and standard data‑handling tools (e.g., pandas), with reproducible, well‑documented workflows
Sound data hygiene: provenance tracking, versioning, QC, and sensible handling of messy or incomplete data
Clear communication and the ability to make and explain prioritisation decisions

Strongly Preferred

Familiarity with cell‑line characterisation and quality concepts (authentication / STR profiling, passage effects, contamination, expression profiling)
PhD preferred, but not essential with sufficient demonstrated experience.
Exposure to transcriptomics / gene‑expression data and cell‑line performance metrics
Understanding of bioactivity and toxicity measures (e.g., IC50, LC50) and dose‑response / therapeutic‑window concepts
Experience working to commercial or translational timelines, not academia alone
Familiarity with project‑tracking tools (e.g., Jira)

Nice to Have

Awareness of compound properties affecting screening, such as solubility and skin penetration / diffusion
Exposure to Bayesian or statistical modelling concepts
Experience supporting ML / AI pipelines with curated training data

#J-18808-Ljbffr

Contact Details:

SEQUENTIAL Recruitment Team

View SEQUENTIAL profile

Junior/ Med Computational Biologist | ML Engineer, BioDiscovery Platform in Cambridge

SEQUENTIAL

Location: Cambridge

Apply Now

Junior/ Med Computational Biologist | ML Engineer, BioDiscovery Platform in Cambridge

Company

Product

Help