Senior/Staff DevOps HPC Engineer

Senior/Staff DevOps HPC Engineer

Full-Time 70000 - 90000 £ / year (est.) No working from home possible
R

At a Glance

  • Tasks: Design and manage HPC systems for groundbreaking drug discovery using machine learning.
  • Company: Join Recursion, a pioneering TechBio company transforming drug discovery.
  • Benefits: Competitive salary, professional development, and opportunities to attend conferences.
  • Other info: Collaborative team environment with excellent growth opportunities and a focus on innovation.
  • Why this job: Make a real impact in healthcare by optimising cutting-edge technology for drug discovery.
  • Qualifications: 10+ years in HPC infrastructure, strong DevOps skills, and experience with cloud platforms.

The predicted salary is between 70000 - 90000 £ per year.

Recursion is revolutionizing the field of drug discovery by integrating Science and Machine Learning, and we are looking for a Senior/Staff DevOps HPC Engineer to join our pioneering team. You will play a crucial role in developing and maintaining our HPC systems that power our cutting‑edge drug discovery research. You will be responsible for designing, implementing, and managing the infrastructure that supports our machine learning and scientific computing workloads. Your day‑to‑day tasks will include building robust and scalable infrastructure, deploying and managing HPC resources, and automating operational processes. You'll apply your deep understanding of DevOps principles and HPC systems to solve complex computational challenges.

This means you'll be actively involved in executing high‑level computational strategies, tracking crucial processing information, and ensuring high data integrity. Furthermore, you will collaborate with a diverse team of scientists, machine learning experts, and other engineers to develop a world‑class data platform that facilitates the generation and management of petabytes of data, enabling the rapid deployment of new deep learning models into the production data pipeline. Your contributions will directly impact the efficiency and effectiveness of our drug discovery efforts. You can expect to work on multiple projects at the same time in a fast‑paced and stimulating environment.

Your responsibilities will not just be limited to maintaining systems and infrastructure, but will also include proactive troubleshooting, routine system maintenance, ensuring the security of our computing environment, and creating detailed documentation for all processes and procedures. Join us, and make a significant impact on the future of drug discovery.

In this role:

  • You’ll design, implement, maintain and optimize our Scientific compute, network, and data storage infrastructure and services using an Infrastructure as Code approach across both on‑premises and public cloud environments.
  • Your technical expertise and leadership will drive innovation across all layers of the HPC/AI infrastructure, ensuring that we provide an effective, scalable platform to support our dynamic scientific workloads.
  • Through developing scripts and workflows, you'll automate and verify infrastructure provisioning and dynamic reconfiguration, various repetitive tasks, enhancing our support of the HPC environments.
  • Your attention to detail will be critical in performance analysis, benchmarking, and tuning of our systems and applications.
  • Your troubleshooting skills will be invaluable as you resolve application, system, and other technical problems, alongside addressing user tickets swiftly.
  • Your role involves researching, deploying, and optimizing workloads and resource scheduling, security, and data lifecycle management policies.
  • You will be involved in regularly assessing the health and operational performance of the platform against established metrics, with a view to achieving and improving operational service metrics and targets associated with the platform.
  • Lastly, as a lead in technical communication and collaboration with our customers, your efforts will ensure a high level of customer satisfaction.

It's your opportunity to make a significant impact in our organization and the wider scientific community.

The Team You’ll Join:

As a Senior/Staff DevOps HPC Engineer, you will be a part of our dedicated HPC Engineering team, reporting directly to the Associate Director. This dynamic team includes two experienced Senior Engineers, and with the addition of two new roles, including this position, you'll be part of an empowered, cross‑functional unit. Our HPC team works in a fast‑paced, collaborative environment, handling a broad spectrum of computational projects. These range from developing advanced, scalable infrastructure to deploying and managing HPC resources and automating operational processes. The team also plays a crucial role in the curation of our vast data platform, which caters to a diverse set of professionals, including biologists, data scientists, and automation engineers. The HPC team is constantly pushing the boundaries in the field of supercomputing in the TechBio industry. As part of this team, you will collaborate on projects that streamline and optimize our machine learning workflows and scientific computing tasks, driving efficient and transformative solutions within the company. This is a unique opportunity to join a team that thrives on innovation, collaboration, and inclusivity in a role that is pivotal to our mission.

The Experience You’ll Need:

  • A minimum of 10 years of experience in dealing with HPC infrastructure, preferably in global BioPharma organizations.
  • Solid experience with software‑defined Infrastructure and cloud computing platforms such as Kubernetes, GCP, AWS, and others.
  • Extensive experience in designing, deploying, supporting, and troubleshooting in complex Linux‑based computing environments.
  • In‑depth hands‑on experience with the provisioning, configuration, and management of infrastructure through Infrastructure as Code (IaC) and cloud automation principles.
  • Python programming and bash scripting experience.
  • Proficiency with source control, continuous integration, configuration management, monitoring, and systems tools.
  • Practical knowledge of resource management and job scheduling using Slurm and Kubernetes.
  • Experience with RDMA‑capable high‑speed networking.
  • Familiarity with parallel file systems and multi‑tier file and object storage.
  • Proficiency in container technology including Apptainer and Docker.
  • Experience in building, installing, and supporting user‑requested software.
  • Strong verbal and written skills for effective communication and documentation.
  • Prior experience mentoring, guiding, and cross‑training team members.

How You’ll be Supported:

The Onboarding process will include peer knowledge transfer sessions, introductions to key stakeholders, and comprehensive exposure to our company culture and processes. You'll have the chance to learn from your colleagues during our regular lunch & learn and tech talk sessions. We offer the opportunity to attend courses for certification in new skills or technologies relevant to your role. If you're keen to hone your leadership skills, you'll have the option to participate in our coaching sessions like BetterUp. To ensure you're always at the forefront of your field, we offer the opportunity to attend conferences.

The Values That We Hope You Share:

  • We Care: We care about our drug candidates, our Recursionauts, their families, each other, our communities, the patients we aim to serve and their loved ones. We also care about our work.
  • We Learn: Learning from the diverse perspectives of our fellow Recursionauts, and from failure, is an essential part of how we make progress.
  • We Deliver: We are unapologetic that our expectations for delivery are extraordinarily high. There is urgency to our existence: we sprint at maximum engagement, making time and space to recover.
  • Act Boldly with Integrity: No company changes the world or reinvents an industry without being bold. It must be balanced; not by timidity, but by doing the right thing even when no one is looking.
  • We are One Recursion: We operate with a 'company first, team second' mentality. Our success comes from working as one interdisciplinary team.

Recursion is a clinical stage TechBio company leading the space by decoding biology to industrialize drug discovery. Enabling its mission is the Recursion OS, a platform built across diverse technologies that continuously expands one of the world’s largest proprietary biological and chemical datasets. Recursion leverages sophisticated machine‑learning algorithms to distill from its dataset a collection of trillions of searchable relationships across biology and chemistry unconstrained by human bias. By commanding massive experimental scale — up to millions of wet lab experiments weekly — and massive computational scale — owning and operating one of the most powerful supercomputers in the world, Recursion is uniting technology, biology and chemistry to advance the future of medicine. Recursion is headquartered in Salt Lake City, where it is a founding member of BioHive, the Utah life sciences industry collective. Recursion also has offices in London, Toronto, Montreal and the San Francisco Bay Area. Learn more at www.Recursion.com, or connect on X (formerly Twitter) and LinkedIn.

Senior/Staff DevOps HPC Engineer employer: Recursion Pharmaceuticals

Recursion is an exceptional employer, offering a dynamic and collaborative work environment in the heart of Salt Lake City. With a strong focus on innovation and employee growth, we provide opportunities for continuous learning through peer knowledge transfer, tech talks, and access to industry conferences. Our commitment to inclusivity and high standards ensures that every team member can make a meaningful impact on the future of drug discovery while enjoying a supportive culture that values care, integrity, and teamwork.

R

Contact Details:

Recursion Pharmaceuticals Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Senior/Staff DevOps HPC Engineer

Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with Recursion employees on LinkedIn. A personal touch can make all the difference when it comes to landing that interview.

Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to HPC and DevOps. This gives potential employers a taste of what you can bring to the table.

Tip Number 3

Prepare for the technical interview! Brush up on your knowledge of cloud platforms, container technologies, and scripting languages. Practising common interview questions can help you feel more confident and ready to impress.

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our innovative team at Recursion.

We think you need these skills to ace Senior/Staff DevOps HPC Engineer

HPC Infrastructure Management
Infrastructure as Code (IaC)
Cloud Computing (AWS, GCP)
Kubernetes
Linux-based Computing Environments
Python Programming
Bash Scripting

Some tips for your application 🫡

Tailor Your CV:Make sure your CV is tailored to highlight your experience with HPC infrastructure and DevOps principles. Use keywords from the job description to show that you understand what we're looking for.

Showcase Your Projects:Include specific projects where you've designed or managed HPC systems. We want to see how you've tackled complex computational challenges, so don't hold back on the details!

Craft a Compelling Cover Letter:Your cover letter should tell us why you're passionate about drug discovery and how your skills align with our mission. Be genuine and let your personality shine through!

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way to ensure your application gets into the right hands and shows your enthusiasm for joining our team!

How to prepare for a job interview at Recursion Pharmaceuticals

Know Your HPC Inside Out

Make sure you brush up on your knowledge of high-performance computing systems. Be ready to discuss your experience with specific technologies like Kubernetes, AWS, and Slurm. Highlight any projects where you've designed or optimised HPC infrastructure, as this will show you're the right fit for the role.

Show Off Your Automation Skills

Since automation is key in this role, prepare examples of how you've used Infrastructure as Code (IaC) to streamline processes. Discuss any scripts you've developed in Python or bash that have improved efficiency or reduced errors in your previous roles.

Collaboration is Key

This position involves working closely with scientists and engineers, so be ready to talk about your teamwork experiences. Share specific instances where your collaboration led to successful outcomes, especially in fast-paced environments. This will demonstrate your ability to thrive in a dynamic team setting.

Prepare for Technical Challenges

Expect to face technical questions or scenarios during the interview. Brush up on troubleshooting techniques and be prepared to discuss how you've resolved complex issues in the past. Showing your problem-solving skills will be crucial in proving your capability for this role.