Data Center Operations & Reliability Manager

Data Center Operations & Reliability Manager

Full-Time 60000 - 80000 £ / year (est.) No working from home possible
V

At a Glance

  • Tasks: Manage data centre operations, ensuring reliability and safety while leading a dedicated team.
  • Company: Join Verda, a pioneering AI cloud company with a vibrant, low-hierarchy culture.
  • Benefits: Enjoy competitive pay, equity options, healthcare, and wellness perks.
  • Other info: Dynamic work environment in Helsinki with opportunities for career growth.
  • Why this job: Be part of an ambitious team shaping the future of AI with renewable energy.
  • Qualifications: 5+ years in data centre operations and strong incident management skills required.

The predicted salary is between 60000 - 80000 £ per year.

Imagine a future where everyone has instant, low-cost access to intelligence. We’re building a fully featured European AI cloud - with everything one needs to train, experiment with, and deploy AI models. In addition, our GPUs run on 100% renewable energy. We’re ambitious, curious, and gutsy doers. We practice a low hierarchy across the company and high morale in our teams. We’ve already achieved a lot, yet we’re only getting started. Now it’s your chance to join the ride. We offer more than just the job - we offer a career-defining opportunity to be part of building something big! Join Verda while it’s still being built - not once it’s finished.

Why Verda

  • Cash + equity compensation along with various fringe benefits (e.g., healthcare, lunch, wellbeing, etc.).
  • Profitable operations with rapid, sustained growth.
  • 31 nationalities, with 6 different ones on the management team.
  • An opportunity to make a clear impact and work alongside world-class engineers, researchers, and partners across the global AI ecosystem.

Practicalities

  • Work mode: Onsite in Helsinki
  • Employment type: Full-time and permanent.

About The Role

Verda's customers run AI workloads that cannot afford to go down. Behind every SLA we sign is a data center that has to deliver it around the clock, every day of the year. We are looking for a Data Center Operations & Reliability Manager to own that promise. You will be accountable for the operational reliability of our data center sites: committing to and following up on our SLAs, tracking and mitigating equipment downtime, running the 24/7 shift coverage of our support engineers, enforcing safety and security guidelines, and owning the incident reporting loop from first alert to closed follow-up.

What You Will Do

  • Own SLA commitments and performance.
  • Define, monitor, and report on service levels, and drive corrective action when targets are at risk.
  • Track equipment downtime across sites, analyze failure patterns, and lead mitigation: root cause analysis, preventive measures, and escalation with vendors where needed.
  • Plan and manage 24/7 shift schedules for support engineers, ensuring continuous coverage, fair rotation, and adequate staffing for planned maintenance and peak periods.
  • Enforce and continuously improve Safety & Security guidelines — ensuring all on-site work follows established protocols and compliance requirements.
  • Oversee incident reports end-to-end: ensure incidents are documented, communicated, followed up, and closed with root cause and prevention actions.
  • Report regularly to management on reliability metrics, incident trends, and operational risks.

What We Are Looking For

  • 5+ years of experience in data center operations, critical facilities, or mission-critical infrastructure environments.
  • Proven experience managing or scheduling teams in a 24/7 shift-based operation.
  • Hands-on understanding of data center infrastructure: power, cooling, networking and common failure modes.
  • Experience with SLA management and operational reporting in a customer-facing infrastructure business.
  • Strong incident management skills: structured response, root cause analysis, and disciplined follow-up.
  • Familiarity with safety and security protocols in critical environments.
  • Strong written and verbal English.

Strong Plus

  • Experience in GPU, HPC, or hyperscale cloud environments, including high-density racks and liquid cooling.
  • Experience with monitoring, ticketing, and maintenance management systems (e.g., DCIM, CMMS).
  • Data center certifications such as CDCP or equivalent.
  • Experience building reliability processes from scratch in a fast-growing company.

What's Next

We’re building fast and this role needs the right person behind it. There's no artificial deadline, but when we find who we're looking for, we move. If this sounds like your next move, apply now. Please submit your application through our Careers page. We don’t accept applications sent by email.

Data Center Operations & Reliability Manager employer: Verda

At Verda, we are not just building a cutting-edge European AI cloud; we are cultivating a vibrant work culture that thrives on ambition and collaboration. Our employees enjoy competitive cash and equity compensation, comprehensive benefits, and the chance to make a significant impact while working alongside world-class professionals in a diverse environment. Join us in Helsinki for a career-defining opportunity where your contributions will shape the future of AI technology.

V

Contact Details:

Verda Recruitment Team

We think you need these skills to ace Data Center Operations & Reliability Manager

SLA Management
Operational Reporting
Incident Management
Root Cause Analysis
Preventive Measures
Data Centre Infrastructure Knowledge
Power and Cooling Systems