ML/AI Engineer in Manchester

ML/AI Engineer in Manchester

Manchester Full-Time No working from home possible
Lloyd's

Responsibilities



  • Exciting opportunity for a hands‑on ML/AI Engineer to join our Data & AI Engineering team

  • You’ll build, automate, and maintain scalable systems that support the full machine learning lifecycle

  • You will lead Kubernetes orchestration, CI/CD automation (including Harness), GPU optimisation, and large‑scale model deployment, owning the path from code commit to reliable, monitored production services

  • This is a unique opportunity to shape the future of AI by embedding fairness, transparency, and accountability at the heart of innovation

  • You’ll join us at an exciting time as we move into the next phase of our transformation

  • We’re on an exciting journey and there couldn’t be a better time to join us

  • The investments we’re making in our people, data, and technology are leading to innovative projects, fresh possibilities, and countless new ways for our people to work, learn, and thrive

  • Compose, build, and operate production‑grade Kubernetes clusters for high‑volume model inference and scheduled training jobs

  • Configure autoscaling, resource quotas, GPU/CPU node pools, service mesh, Helm charts, and custom operators to meet reliability and efficiency targets

  • Implement GitOps workflows for environment configuration and application releases

  • Build CI/CD pipelines in Harness (or equivalent) to automate build, test, model packaging, and deployment across environments (dev / pre‑prod / prod)

  • Enable progressive delivery (blue/green, canary) and rollback strategies, integrating quality gates, unit/integration tests, and model‑evaluation checks

  • Standardise pipelines for continuous training (CT) and continuous monitoring (CM) to keep models fresh and safe in production

  • Deploy and tune GPU‑backed inference services (e.g., A100), optimise CUDA environments, and leverage TensorRT where appropriate

  • Operate scalable serving frameworks (NVIDIA Triton, TorchServe) with attention to latency, efficiency, resilience, and cost

  • Implement end‑to‑end observability for models and pipelines: drift, data quality, fairness signals, latency, GPU utilisation, error budgets, and SLOs/SLIs via Prometheus, Grafana, and Dynatrace

  • Establish actionable alerting and runbooks for on‑call operations; drive incident reviews and reliability improvements

  • Operate a model registry (e.g., MLflow) with experiment tracking, versioning, lineage, and environment‑specific artefacts

  • Enforce audit readiness: model cards, reproducible builds, provenance, and controlled promotion between stages

  • HOURS: Full-time – 35 hours


Benefits



  • A generous holiday allowance: You’ll be eligible for a minimum of 22 days holiday (excluding bank holidays), rising to 30 days based on length of service and grade.

  • A flexible way of working: Whether you want flexibility over your location or when you log on, together we can create an approach that works for you and for the business.

  • Family leave: Up to 63 weeks of maternity or adoption leave. Statutory maternity or adoption pay is available for 39 weeks, and 20 weeks will be enhanced to the equivalent of full pay. Partners can have six weeks of fully paid paternity leave.

  • Flex cash: This is 4% of your basic salary and can be used to spend on the benefits of your choice, or you can choose to take it as a cash top up in your monthly salary.

  • Health insurance: Our company funded Private Medical Benefit provides all colleagues with access to good quality medical care, including accommodation, nursing care and specialist advice.

  • Colleague Offers: Get discounts on everything from electrical items to cinema tickets and weekly food shopping. You can share this benefit with up to ten family members or friends.

  • Financial products: Take advantage of our great financial products, some at a discounted rate, including current accounts, home and car insurance and loans.

  • Share plans: Participate in Sharematch and receive matching shares of up to £45 a month from the company, and you can choose to participate in Sharesave, our combined savings and share option plan.

  • Pension: We offer a generous pension plan, with all joiners being automatically enrolled in our ‘Your Tomorrow’ scheme. You can decide how much you save and get a say in where your contributions are invested.


Qualifications


We’re looking for curious, passionate engineers who thrive on innovation and want to make a real impactExpert use of Git, branching models, protected merges, and code‑review workflowsPractical experience with CUDA, TensorRT, Triton, TorchServe, and GPU scheduling/optimisationExperience operating MLflow (or equivalent) for experiment tracking, model bundling, and deploymentsCI/CD expertise having hands‑on experience with Harness (or similar) building multi‑stage pipelines; experience with GitOps, artefact repositories, and environment promotionProficiency in Prometheus, Grafana, Dynatrace defining SLIs/SLOs and alert thresholds for ML systemsStrong Python for automation, tooling, and service developmentDeep expertise in Kubernetes, Docker, Helm, operators, node‑pool management, and autoscalingExperience with GCP (e.g., GKE, Cloud Run, Pub/Sub, BigQuery) and Vertex AI (Endpoints, Pipelines, Model Monitoring, Feature Store)Hooks for prompt/version management, offline/online evaluation, and human‑in‑the‑loop workflows (e.g., RLHF) to enable continuous improvementFamiliarity with Model Context Protocol (MCP) for tool interoperability, plus Google ADK, LangGraph/LangChain for agent orchestration and multi‑agent patternsRay, Kubeflow, or similar frameworksExperience embedding controls, audit evidence, and governance in regulated environmentsExperience with GPU efficiency, autoscaling strategies, and workload right‑sizing

#J-18808-Ljbffr
Lloyd's

Contact Details:

Lloyd's Recruitment Team