Senior/Staff DevOps HPC Engineer in London

Senior/Staff DevOps HPC Engineer in London

London Full-Time 70000 - 90000 £ / year (est.) No working from home possible
R

At a Glance

  • Tasks: Design and manage HPC systems for groundbreaking drug discovery using machine learning.
  • Company: Join Recursion, a pioneering TechBio company revolutionising drug discovery.
  • Benefits: Competitive salary, professional development, and opportunities to attend conferences.
  • Other info: Collaborative team environment with excellent growth opportunities.
  • Why this job: Make a real impact in healthcare by optimising drug discovery processes.
  • Qualifications: 10+ years in HPC infrastructure with strong DevOps and cloud experience.

The predicted salary is between 70000 - 90000 £ per year.

Recursion is revolutionizing the field of drug discovery by integrating Science and Machine Learning, and we are looking for a Senior/Staff DevOps HPC Engineer to join our pioneering team. You will play a crucial role in developing and maintaining our HPC systems that power our cutting‑edge drug discovery research. You will be responsible for designing, implementing, and managing the infrastructure that supports our machine learning and scientific computing workloads.

Your day‑to‑day tasks will include building robust and scalable infrastructure, deploying and managing HPC resources, and automating operational processes. You'll apply your deep understanding of DevOps principles and HPC systems to solve complex computational challenges. This means you'll be actively involved in executing high‑level computational strategies, tracking crucial processing information, and ensuring high data integrity.

Furthermore, you will collaborate with a diverse team of scientists, machine learning experts, and other engineers to develop a world‑class data platform that facilitates the generation and management of petabytes of data, enabling the rapid deployment of new deep learning models into the production data pipeline. Your contributions will directly impact the efficiency and effectiveness of our drug discovery efforts. You can expect to work on multiple projects at the same time in a fast‑paced and stimulating environment.

Your responsibilities will not just be limited to maintaining systems and infrastructure, but will also include proactive troubleshooting, routine system maintenance, ensuring the security of our computing environment, and creating detailed documentation for all processes and procedures. Join us, and make a significant impact on the future of drug discovery.

In this role:

  • You’ll design, implement, maintain and optimize our Scientific compute, network, and data storage infrastructure and services using an Infrastructure as Code approach across both on‑premises and public cloud environments.
  • Your technical expertise and leadership will drive innovation across all layers of the HPC/AI infrastructure, ensuring that we provide an effective, scalable platform to support our dynamic scientific workloads.
  • Through developing scripts and workflows, you'll automate and verify infrastructure provisioning and dynamic reconfiguration, various repetitive tasks, enhancing our support of the HPC environments.
  • Your attention to detail will be critical in performance analysis, benchmarking, and tuning of our systems and applications.
  • Your troubleshooting skills will be invaluable as you resolve application, system, and other technical problems, alongside addressing user tickets swiftly.
  • Your role involves researching, deploying, and optimizing workloads and resource scheduling, security, and data lifecycle management policies.
  • You will be involved in regularly assessing the health and operational performance of the platform against established metrics, with a view to achieving and improving operational service metrics and targets associated with the platform.
  • Lastly, as a lead in technical communication and collaboration with our customers, your efforts will ensure a high level of customer satisfaction.

It's your opportunity to make a significant impact in our organization and the wider scientific community.

The Team You’ll Join:

As a Senior/Staff DevOps HPC Engineer, you will be a part of our dedicated HPC Engineering team, reporting directly to the Associate Director. This dynamic team includes two experienced Senior Engineers, and with the addition of two new roles, including this position, you'll be part of an empowered, cross‑functional unit. Our HPC team works in a fast‑paced, collaborative environment, handling a broad spectrum of computational projects. These range from developing advanced, scalable infrastructure to deploying and managing HPC resources and automating operational processes. The team also plays a crucial role in the curation of our vast data platform, which caters to a diverse set of professionals, including biologists, data scientists, and automation engineers.

The HPC team is constantly pushing the boundaries in the field of supercomputing in the TechBio industry. As part of this team, you will collaborate on projects that streamline and optimize our machine learning workflows and scientific computing tasks, driving efficient and transformative solutions within the company. This is a unique opportunity to join a team that thrives on innovation, collaboration, and inclusivity in a role that is pivotal to our mission.

The Experience You’ll Need:

  • A minimum of 10 years of experience in dealing with HPC infrastructure, preferably in global BioPharma organizations.
  • Solid experience with software‑defined Infrastructure and cloud computing platforms such as Kubernetes, GCP, AWS, and others.
  • Extensive experience in designing, deploying, supporting, and troubleshooting in complex Linux‑based computing environments.
  • In‑depth hands‑on experience with the provisioning, configuration, and management of infrastructure through Infrastructure as Code (IaC) and cloud automation principles.
  • Python programming and bash scripting experience.
  • Proficiency with source control, continuous integration, configuration management, monitoring, and systems tools.
  • Practical knowledge of resource management and job scheduling using Slurm and Kubernetes.
  • Experience with RDMA‑capable high‑speed networking.
  • Familiarity with parallel file systems and multi‑tier file and object storage.
  • Proficiency in container technology including Apptainer and Docker.
  • Experience in building, installing, and supporting user‑requested software.
  • Strong verbal and written skills for effective communication and documentation.
  • Prior experience mentoring, guiding, and cross‑training team members.

How You’ll be Supported:

The Onboarding process will include peer knowledge transfer sessions, introductions to key stakeholders, and comprehensive exposure to our company culture and processes. You'll have the chance to learn from your colleagues during our regular lunch & learn and tech talk sessions. We offer the opportunity to attend courses for certification in new skills or technologies relevant to your role. If you're keen to hone your leadership skills, you'll have the option to participate in our coaching sessions like BetterUp. To ensure you're always at the forefront of your field, we offer the opportunity to attend conferences.

The Values That We Hope You Share:

  • We Care: We care about our drug candidates, our Recursionauts, their families, each other, our communities, the patients we aim to serve and their loved ones. We also care about our work.
  • We Learn: Learning from the diverse perspectives of our fellow Recursionauts, and from failure, is an essential part of how we make progress.
  • We Deliver: We are unapologetic that our expectations for delivery are extraordinarily high. There is urgency to our existence: we sprint at maximum engagement, making time and space to recover.
  • Act Boldly with Integrity: No company changes the world or reinvents an industry without being bold. It must be balanced; not by timidity, but by doing the right thing even when no one is looking.
  • We are One Recursion: We operate with a 'company first, team second' mentality. Our success comes from working as one interdisciplinary team.

Recursion is a clinical stage TechBio company leading the space by decoding biology to industrialize drug discovery. Enabling its mission is the Recursion OS, a platform built across diverse technologies that continuously expands one of the world’s largest proprietary biological and chemical datasets. Recursion leverages sophisticated machine‑learning algorithms to distill from its dataset a collection of trillions of searchable relationships across biology and chemistry unconstrained by human bias. By commanding massive experimental scale — up to millions of wet lab experiments weekly — and massive computational scale — owning and operating one of the most powerful supercomputers in the world, Recursion is uniting technology, biology and chemistry to advance the future of medicine. Recursion is headquartered in Salt Lake City, where it is a founding member of BioHive, the Utah life sciences industry collective. Recursion also has offices in London, Toronto, Montreal and the San Francisco Bay Area.

Senior/Staff DevOps HPC Engineer in London employer: Recursion Pharmaceuticals

Recursion is an exceptional employer, offering a dynamic and collaborative work culture that fosters innovation in the TechBio industry. With a strong commitment to employee growth, you will have access to continuous learning opportunities, mentorship, and the chance to attend conferences, all while working in the vibrant environment of Salt Lake City. Join us to make a meaningful impact on drug discovery and be part of a team that values care, learning, and integrity.

R

Contact Details:

Recursion Pharmaceuticals Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Senior/Staff DevOps HPC Engineer in London

Tip Number 1

Network like a pro! Reach out to folks in the industry on LinkedIn or at meetups. A friendly chat can open doors that applications alone can't.

Tip Number 2

Show off your skills! Create a portfolio or GitHub repo showcasing your projects and contributions. This gives potential employers a taste of what you can do.

Tip Number 3

Prepare for interviews by practising common questions and scenarios related to HPC and DevOps. The more you rehearse, the more confident you'll feel when it counts.

Tip Number 4

Apply through our website! It’s the best way to ensure your application gets seen. Plus, we love seeing candidates who are genuinely interested in joining our team.

We think you need these skills to ace Senior/Staff DevOps HPC Engineer in London

HPC Infrastructure Management
Cloud Computing (AWS, GCP)
Kubernetes
Linux-based Computing Environments
Infrastructure as Code (IaC)
Python Programming
Bash Scripting

Some tips for your application 🫡

Tailor Your CV:Make sure your CV is tailored to the role of Senior/Staff DevOps HPC Engineer. Highlight your experience with HPC infrastructure, cloud platforms, and any relevant projects that showcase your skills in automation and troubleshooting.

Craft a Compelling Cover Letter:Your cover letter should tell us why you're passionate about drug discovery and how your background aligns with our mission. Share specific examples of your work that demonstrate your expertise in DevOps principles and HPC systems.

Showcase Your Technical Skills:Don’t hold back on showcasing your technical skills! Mention your experience with tools like Kubernetes, AWS, and Python scripting. We want to see how you’ve applied these skills in real-world scenarios to solve complex problems.

Apply Through Our Website:We encourage you to apply through our website for a smoother application process. This way, you can ensure all your details are captured correctly, and it helps us keep track of your application efficiently!

How to prepare for a job interview at Recursion Pharmaceuticals

Know Your HPC Inside Out

Make sure you brush up on your knowledge of high-performance computing systems. Be ready to discuss your experience with specific technologies like Kubernetes, AWS, and Slurm. Highlight any projects where you've designed or optimised HPC infrastructure, as this will show you're the right fit for the role.

Show Off Your Automation Skills

Since automation is key in this role, prepare examples of how you've used Infrastructure as Code (IaC) to streamline processes. Discuss any scripts you've developed in Python or bash that have improved efficiency or reduced errors in your previous roles.

Collaboration is Key

This position involves working closely with scientists and engineers, so be ready to talk about your teamwork experiences. Share specific instances where your collaboration led to successful outcomes, especially in fast-paced environments. This will demonstrate your ability to thrive in their dynamic team.

Prepare for Technical Challenges

Expect technical questions that test your troubleshooting skills. Think of complex problems you've solved in the past and be prepared to explain your thought process. This will showcase your analytical abilities and your deep understanding of HPC systems, which is crucial for this role.