Fleet Reliability Operations Engineer
Fleet Reliability Operations Engineer

Fleet Reliability Operations Engineer

London Full-Time 40000 - 55000 ÂŁ / year (est.) No home office possible
Go Premium
C

p=\”Join to apply for the Fleet Reliability Operations Engineer role at CoreWeave .\”

p=\”CoreWeave is the essential cloud for AITM delivering a platform that enables innovators to build and scale AI with confidence. Founded in 2017 and traded on Nasdaq as CRWV, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs.\”

p=\”We are proud to be a Living Wage accredited employer. The Fleet Reliability Operations team manages the day‐to‐day provisioning, management and uptime of CoreWeave\’s expanding fleet of server nodes. The role focuses on configuration, updates, remote troubleshooting and ensuring the highest‐tier supercomputing clusters operate at maximum capacity.\”

p=\”Shifts run two times a day, from 7 am to 9 pm. Successful candidates will attend onboarding training at our US headquarters for up to two weeks within the first month of employment.\”

h3=\”Key Responsibilities\”ul=\”Configure and maintain large‐scale, high‐performance supercomputing clusters running state‐of‐the‐art GPUs.

Troubleshoot hardware and software issues; coordinate with data center, network, hardware and platform teams to drive resolution.

Monitor and analyze system performance and take remediation actions to maintain cloud health.

Create and maintain documentation of team processes, knowledge and best practices for system management.

Collaborate with the team to improve processes and efficiency.

Participate in on‐call rotations, including after‐hours and weekend work.

\”

h3=\”Required Skills & Experience\”ul=\”At least 2 years of experience troubleshooting or administering data center or on‐prem infrastructure (servers, storage, network).

Strong Linux system administration and networking fundamentals.

Ability to perform consistent, reliable system maintenance and hardware/software troubleshooting.

Bachelor\’s degree in a related field or equivalent experience.

\”

h3=\”Preferred (but not required)\”ul=\”Experience with bash, python, PowerShell or similar scripting languages.

Knowledge of observability platforms such as Grafana, Prometheus, promsql.

Familiarity with data‐center environments, HVAC, fiber trays.

Kubernetes administration skills.

GPU‐based HPC workload experience.

\”

p=\”Short‐notice business travel to the United States may be required. Applicants must be able to travel lawfully on short notice, holding the necessary U.S. authorization (e.g., ESTA or a B‐1 visa).\”

h3=\”Benefits\”ul=\”Family‐level medical, dental and vision insurance.

Generous pension contribution.

Life assurance at 4× salary.

Critical illness cover.

Employee assistance programme.

Tuition reimbursement.

Work culture focused on innovative disruption.

\”

p=\”CoreWeave is an equal‐opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status or genetic information.\”

p=\”Legal compliance – The position requires access to export‐controlled information and requires a U.S. person or a U.S. employee eligible to access such information. Applicants must meet the U.S. Government export regulations or obtain the required authorization.\”

#J-18808-Ljbffr

Fleet Reliability Operations Engineer employer: CoreWeave

CoreWeave is an exceptional employer, offering a dynamic work culture that thrives on innovation and adaptability. With a commitment to employee growth, we provide comprehensive benefits including family-level medical insurance and tuition reimbursement, all while fostering a hybrid workplace that values flexibility and collaboration. Join us in our mission to power the next wave of AI, where your contributions will make a significant impact in a rapidly evolving industry.
C

Contact Detail:

CoreWeave Recruiting Team

Fleet Reliability Operations Engineer
CoreWeave
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

C
  • Fleet Reliability Operations Engineer

    London
    Full-Time
    40000 - 55000 ÂŁ / year (est.)
  • C

    CoreWeave

    50-100
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>