HPC Platform Management Engineer

HPC Platform Management Engineer

London Full-Time 43200 - 72000 £ / year (est.) No home office possible
Go Premium
Q

At a Glance

  • Tasks: Join our team to develop and maintain cutting-edge HPC platforms for scalable computing.
  • Company: Qube Research & Technologies is a global leader in quantitative investment, driven by data and technology.
  • Benefits: Enjoy a flexible work environment with initiatives for a healthy work-life balance.
  • Why this job: Be part of an innovative culture that values collaboration and tackles complex challenges in tech.
  • Qualifications: Experience with HPC schedulers and large-scale systems is essential; Python and AWS knowledge preferred.
  • Other info: We celebrate diversity and encourage a respectful workplace for all employees.

The predicted salary is between 43200 - 72000 £ per year.

Distributed Computing Application Engineer
Qube Research & Technologies (QRT) is a global quantitative and systematic investment manager, operating in all liquid asset classes across the world. We are a technology and data driven group implementing a scientific approach to investing. Combining data, research, technology, and trading expertise has shaped QRT\’s collaborative mindset which enables us to solve the most complex challenges. QRT\’s culture of innovation continuously drives our ambition to deliver high quality returns for our investors.
Join QRT as a technologist within our Workload Scheduling (WLS) team. This key role supports both business and technology groups in integrating High Performance Computing (HPC) solutions, enabling scalable and efficient compute capabilities. You will be instrumental in developing, deploying, and maintaining HPC platforms that leverage Yellow Dog and Ray schedulers across cloud and on-premises infrastructures.
Your Future Role within QRT:

  • Develop and support scalable workload scheduling solutions for HPC environments
  • Collaborate with internal teams to adopt and optimize HPC platforms
  • Improve the performance, resilience, and observability of compute infrastructure
  • Contribute to infrastructure automation and continuous improvement initiatives
  • Share expertise and support team development through coaching and collaboration

Your Present Skillset:

  • Experience of engineering and supporting at least one HPC scheduler, such as YellowDog, Ray, Slurm or IBM Symphony
  • Good understanding of both loosely coupled and tightly coupled HPC workloads
  • Experience of developing and supporting large-scale systems (5000+ nodes) and high levels of concurrency (100k+ tasks)
  • Experience of monitoring and visualisation of large-scale systems
  • Performance tuning of compute, network and storage components
  • Good understanding of the challenges of user authorisation in large scale distributed environments using AWS IAM and identity providers such as Okta
  • Good understanding of core AWS services
  • VPC security and networking
  • EC2 configuration and scaling
  • Storage services S3, EFS, EBS and FSx
  • CloudWatch / CloudTrail / OpenSearch / Athena
  • Experience of developing Python applications and tools
  • Experience with infrastructure-as-code using configuration languages and tools, particularly Terraform and Ansible
  • Solid understanding of Linux administration skills
  • Good understanding of various storage solutions and their applicability for different use cases
  • Able to work in a fast-paced environment with multiple conflicting demands and changing priorities
  • Effective communicator, able to describe complex issues at the appropriate level for a given audience
  • Happy to coach colleagues and eager to learn from them

QRT is an equal opportunity employer. We welcome diversity as essential to our success. QRT empowers employees to work openly and respectfully to achieve collective success. In addition to professional achievement, we are offering initiatives and programs to enable employees achieve a healthy work-life balance. #J-18808-Ljbffr

HPC Platform Management Engineer employer: Qube Research & Technologies

Qube Research & Technologies (QRT) is an exceptional employer that fosters a culture of innovation and collaboration, making it an ideal place for a HPC Platform Management Engineer. With a strong emphasis on employee growth, QRT offers opportunities for professional development through coaching and collaboration, while also prioritising a healthy work-life balance. Located in a dynamic environment, employees benefit from cutting-edge technology and the chance to tackle complex challenges in a supportive and diverse workplace.
Q

Contact Detail:

Qube Research & Technologies Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land HPC Platform Management Engineer

✨Tip Number 1

Familiarise yourself with the specific HPC schedulers mentioned in the job description, such as YellowDog and Ray. Having hands-on experience or even personal projects using these tools can set you apart from other candidates.

✨Tip Number 2

Showcase your understanding of AWS services, particularly around VPC security and networking. Consider preparing examples of how you've implemented these services in past projects to demonstrate your practical knowledge.

✨Tip Number 3

Highlight any experience you have with infrastructure-as-code tools like Terraform and Ansible. Being able to discuss specific scenarios where you've used these tools effectively will illustrate your capability in automating infrastructure.

✨Tip Number 4

Prepare to discuss your approach to performance tuning in large-scale systems. Be ready to share insights on challenges you've faced and how you overcame them, as this will demonstrate your problem-solving skills and technical expertise.

We think you need these skills to ace HPC Platform Management Engineer

HPC Scheduler Experience (YellowDog, Ray, Slurm, IBM Symphony)
Understanding of Loosely Coupled and Tightly Coupled HPC Workloads
Large-Scale Systems Development and Support (5000+ nodes)
High Concurrency Management (100k+ tasks)
Performance Tuning for Compute, Network, and Storage
Monitoring and Visualisation of Large-Scale Systems
User Authorisation in Distributed Environments (AWS IAM, Okta)
Core AWS Services Knowledge (VPC, EC2, S3, EFS, EBS, FSx)
CloudWatch, CloudTrail, OpenSearch, Athena Proficiency
Python Application Development
Infrastructure-as-Code (Terraform, Ansible)
Linux Administration Skills
Storage Solutions Understanding
Ability to Work in Fast-Paced Environments
Effective Communication Skills
Coaching and Mentoring Abilities

Some tips for your application 🫡

Understand the Role: Before applying, make sure you fully understand the responsibilities and requirements of the HPC Platform Management Engineer position. Familiarise yourself with the technologies mentioned, such as Yellow Dog and Ray schedulers, to tailor your application effectively.

Highlight Relevant Experience: In your CV and cover letter, emphasise your experience with HPC schedulers and large-scale systems. Be specific about your achievements in performance tuning and infrastructure automation, as these are key aspects of the role.

Showcase Your Skills: Make sure to include your technical skills, particularly in Python development, AWS services, and infrastructure-as-code tools like Terraform and Ansible. Use examples to demonstrate how you've applied these skills in previous roles.

Craft a Compelling Cover Letter: Write a cover letter that not only outlines your qualifications but also reflects your enthusiasm for the role and the company culture at QRT. Mention your eagerness to contribute to their innovative environment and your commitment to continuous improvement.

How to prepare for a job interview at Qube Research & Technologies

✨Showcase Your HPC Knowledge

Make sure to highlight your experience with HPC schedulers like YellowDog or Ray. Be prepared to discuss specific projects where you've implemented these technologies and the impact they had on performance and efficiency.

✨Demonstrate Problem-Solving Skills

Prepare examples of how you've tackled complex challenges in large-scale systems. Discuss your approach to performance tuning and how you’ve improved resilience and observability in previous roles.

✨Familiarise Yourself with AWS Services

Since the role involves cloud infrastructure, brush up on your knowledge of AWS services such as EC2, S3, and IAM. Be ready to explain how you've used these services in past projects and how they relate to HPC environments.

✨Emphasise Collaboration and Communication

QRT values teamwork, so be prepared to discuss how you've collaborated with others in your previous roles. Share examples of how you've coached colleagues or contributed to team development, showcasing your effective communication skills.

HPC Platform Management Engineer
Qube Research & Technologies
Go Premium

Land your dream job quicker with Premium

Your application goes to the top of the list
Personalised CV feedback that lands interviews
Support from real people with tickets
Apply for more jobs in less time with AI support
Go Premium

Money-back if you don't land a job in 6-months

Q
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>