Data Center Operations & Reliability Manager

Job Board

Companies

Verda

Data Center Operations & Reliability Manager

Full-Time 60000 - 80000 £ / year (est.) No working from home possible

Apply Now

At a Glance

Tasks: Manage data centre operations, ensuring reliability and safety while leading a dedicated team.
Company: Join Verda, a pioneering AI cloud company with a vibrant, low-hierarchy culture.
Benefits: Enjoy competitive pay, equity options, healthcare, and wellness perks.
Other info: Dynamic work environment in Helsinki with opportunities for career growth.
Why this job: Be part of an ambitious team shaping the future of AI with renewable energy.
Qualifications: 5+ years in data centre operations and strong incident management skills required.

The predicted salary is between 60000 - 80000 £ per year.

Imagine a future where everyone has instant, low-cost access to intelligence. We’re building a fully featured European AI cloud - with everything one needs to train, experiment with, and deploy AI models. In addition, our GPUs run on 100% renewable energy. We’re ambitious, curious, and gutsy doers. We practice a low hierarchy across the company and high morale in our teams. We’ve already achieved a lot, yet we’re only getting started. Now it’s your chance to join the ride. We offer more than just the job - we offer a career-defining opportunity to be part of building something big! Join Verda while it’s still being built - not once it’s finished.

Why Verda

Cash + equity compensation along with various fringe benefits (e.g., healthcare, lunch, wellbeing, etc.).
Profitable operations with rapid, sustained growth.
31 nationalities, with 6 different ones on the management team.
An opportunity to make a clear impact and work alongside world-class engineers, researchers, and partners across the global AI ecosystem.

Practicalities

Work mode: Onsite in Helsinki
Employment type: Full-time and permanent.

About The Role

Verda's customers run AI workloads that cannot afford to go down. Behind every SLA we sign is a data center that has to deliver it around the clock, every day of the year. We are looking for a Data Center Operations & Reliability Manager to own that promise. You will be accountable for the operational reliability of our data center sites: committing to and following up on our SLAs, tracking and mitigating equipment downtime, running the 24/7 shift coverage of our support engineers, enforcing safety and security guidelines, and owning the incident reporting loop from first alert to closed follow-up.

What You Will Do

Own SLA commitments and performance.
Define, monitor, and report on service levels, and drive corrective action when targets are at risk.
Track equipment downtime across sites, analyze failure patterns, and lead mitigation: root cause analysis, preventive measures, and escalation with vendors where needed.
Plan and manage 24/7 shift schedules for support engineers, ensuring continuous coverage, fair rotation, and adequate staffing for planned maintenance and peak periods.
Enforce and continuously improve Safety & Security guidelines — ensuring all on-site work follows established protocols and compliance requirements.
Oversee incident reports end-to-end: ensure incidents are documented, communicated, followed up, and closed with root cause and prevention actions.
Report regularly to management on reliability metrics, incident trends, and operational risks.

What We Are Looking For

5+ years of experience in data center operations, critical facilities, or mission-critical infrastructure environments.
Proven experience managing or scheduling teams in a 24/7 shift-based operation.
Hands-on understanding of data center infrastructure: power, cooling, networking and common failure modes.
Experience with SLA management and operational reporting in a customer-facing infrastructure business.
Strong incident management skills: structured response, root cause analysis, and disciplined follow-up.
Familiarity with safety and security protocols in critical environments.
Strong written and verbal English.

Strong Plus

Experience in GPU, HPC, or hyperscale cloud environments, including high-density racks and liquid cooling.
Experience with monitoring, ticketing, and maintenance management systems (e.g., DCIM, CMMS).
Data center certifications such as CDCP or equivalent.
Experience building reliability processes from scratch in a fast-growing company.

What's Next

We’re building fast and this role needs the right person behind it. There's no artificial deadline, but when we find who we're looking for, we move. If this sounds like your next move, apply now. Please submit your application through our Careers page. We don’t accept applications sent by email.

Data Center Operations & Reliability Manager employer: Verda

At Verda, we are not just building a cutting-edge European AI cloud; we are cultivating a vibrant work culture that thrives on ambition and collaboration. Our employees enjoy competitive cash and equity compensation, comprehensive benefits, and the chance to make a significant impact while working alongside world-class professionals in a diverse environment. Join us in Helsinki for a career-defining opportunity where your contributions will shape the future of AI technology.

Contact Details:

Verda Recruitment Team

View Verda profile

We think you need these skills to ace Data Center Operations & Reliability Manager

SLA Management

Operational Reporting

Incident Management

Root Cause Analysis

Preventive Measures

Data Centre Infrastructure Knowledge

Power and Cooling Systems

Networking

Safety and Security Protocols

24/7 Shift Scheduling

Team Management

Communication Skills

Analytical Skills

Experience with Monitoring Systems

Data Centre Certifications

Data Center Operations & Reliability Manager

Verda

Apply Now

Data Center Operations & Reliability Manager

At a Glance

Data Center Operations & Reliability Manager employer: Verda

We think you need these skills to ace Data Center Operations & Reliability Manager

Company

Product

Help