Network Engineer, Reliability & Observability

Network Engineer, Reliability & Observability

Full-Time 120000 - 200000 £ / year (est.) Home office (partial)
FluidStack

At a Glance

  • Tasks: Champion reliability engineering and improve AI network quality through innovative processes and data metrics.
  • Company: Fluidstack, a tech company focused on expanding human freedom through powerful AI infrastructure.
  • Benefits: Competitive salary, equity options, health insurance, generous PTO, and retirement plans.
  • Other info: Collaborative environment with opportunities for growth and travel to local offices.
  • Why this job: Join us in building civilization-scale infrastructure for AI and make a real impact.
  • Qualifications: 5+ years in network engineering with strong operational experience and software development skills.

The predicted salary is between 120000 - 200000 £ per year.

About Fluidstack

We exist to make humanity more free. Technology gave people more time for the things they wanted to do, instead of things they had to do. Powerful AI will be the biggest lever for human choice we've ever built - but only if models are aligned with what humanity actually wants. We acquire power, design and build data centers, and operate them - with teams spanning hardware and software. Speed and scale are our key differentiators. Come be a part of building civilization-scale infrastructure for AI.

About The Role

Fluidstack is seeking a Network Engineer, Reliability & Observability to serve as a reliability engineer championing and building process, data collections, and reliability metrics with the objective of improving the quality and reliability of AI networks from deployment through the full lifecycle of operations. This role is focused on developing processes, systems, tools, data and data pipelines, and observability to improve the quality of networks and deliver automated metrics (24x7) as well as periodic reliability reports for both internal and external customers.

This role is ideal for experienced network operators who are passionate about reliability and have experience designing and building full lifecycle software such as Quality Assurance audits, circuit audits, periodic audits, failure rates and failure analysis. You are passionate about hardware (electronics and optics), software development, and you value and promote the use of data to make informed decisions in deployment, operations, and strategic sourcing. Experienced SRE (Site Reliability Engineers) with a passion for networking are encouraged to apply.

Focus

  • Ownership of Quality Assurance: Design, develop, and support QA process for network hardware and networks.
  • Pipelines: Develop and deploy serverless workflows, server based, and manually triggered data pipelines producing network quality and reliability observability for internal and external customers.
  • Deployment and Operations Support: Support full lifecycle data collection and analysis partnering with Deployment, Operations, DC hardware, and logistics teams to produce data that drives process improvements and delivers on SLA and SLOs.
  • Process Engineering: Develop, pilot, and deploy process improvements for deployment and repair to produce data and consume data with Machine Learning to fulfill our mission.
  • Cross-Team Collaboration: Own without ego and execute in a collaborative team with design, deployment, operations engineers and software developers.
  • Subject Matter Expert: In at least two or more deep subjects such as IP routing, optics, optical transport, Ethernet, RDMA/RoCE, or electrical power.

About You

  • Strong Operations Background: 5+ years in network engineering and at least 3+ years in operations with significant hands‑on operational experience.
  • Software Development: You have experience with ITIL, Agile (xP), and TDD including developing and leading programs and projects.
  • Datacenter Fabric Expertise: Deep experience operating modern datacenter networks including EVPN/VXLAN, BGP, CLOS topologies, and high‑radix switching.
  • Incident Response Excellence: Proven ability to lead incident response, perform systematic troubleshooting, and drive issues to resolution.
  • Matrix Leadership Experience: You understand how to build relationships with onsite teams, coordinate physical infrastructure work, and represent network engineering in a field environment.
  • Operational Pragmatism: You balance perfection with progress.
  • Self Driven: You embrace complex challenges with undefined processes and key results.
  • Travel: You are willing and able to travel to spend time with the team at our local offices or data center locations, up to 20% of the time.

Nice to Haves

  • AI/HPC Fabric Operations: Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2), lossless Ethernet (PFC, ECN), or high‑performance networking.
  • Reliability Engineering: You have experience with observability and reliability engineering from network operations or in manufacturing quality.
  • Hardware Repair Experience: Hands‑on experience coordinating hardware repairs, RMAs, and physical infrastructure work.
  • Observability & Monitoring: Familiarity with network monitoring platforms, alerting systems, and telemetry collection.

Salary & Benefits

Competitive total compensation package (salary + equity). Retirement or pension plan, in line with local norms. Health, dental, and vision insurance. Generous PTO policy, in line with local norms.

The base salary range for this position is $150,000 - $250,000 per year, depending on experience, skills, qualifications, and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options. We are committed to pay equity and transparency. Fluidstack is an Equal Employment Opportunity Employer.

Network Engineer, Reliability & Observability employer: FluidStack

Fluidstack is an exceptional employer that champions innovation and collaboration in the rapidly evolving field of AI infrastructure. With a strong focus on employee growth, we offer competitive compensation, generous benefits, and a culture that values passion and expertise in network engineering. Join us in a dynamic environment where your contributions directly impact the future of technology and human freedom.

FluidStack

Contact Details:

FluidStack Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Network Engineer, Reliability & Observability

Tip Number 1

Network with industry professionals! Attend meetups, webinars, or conferences related to network engineering. Engaging with others in the field can lead to valuable connections and potential job opportunities.

Tip Number 2

Show off your skills! Create a portfolio showcasing your projects, especially those related to reliability and observability. This gives you a chance to demonstrate your expertise and passion for the role.

Tip Number 3

Prepare for interviews by practising common technical questions and scenarios. We recommend simulating real-life problems you might face as a Network Engineer, so you can showcase your problem-solving skills under pressure.

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets noticed. Plus, it shows you’re genuinely interested in being part of our mission at Fluidstack.

We think you need these skills to ace Network Engineer, Reliability & Observability

Network Engineering
Reliability Engineering
Data Collection and Analysis
Quality Assurance
Incident Response
Golang
Python

Some tips for your application 🫡

Tailor Your Application:Make sure to customise your CV and cover letter for the Network Engineer role. Highlight your experience in network operations and reliability engineering, and show us how your skills align with our mission at Fluidstack.

Show Your Passion:We want to see your enthusiasm for AI and networking! Share any personal projects or experiences that demonstrate your commitment to improving network reliability and observability. Let your passion shine through!

Be Clear and Concise:When writing your application, keep it straightforward. Use clear language and avoid jargon where possible. We appreciate a well-structured application that gets straight to the point while showcasing your qualifications.

Apply Through Our Website:Don’t forget to submit your application through our website! It’s the best way for us to receive your details and ensures you’re considered for the role. We can’t wait to hear from you!

How to prepare for a job interview at FluidStack

Know Your Stuff

Make sure you brush up on your technical knowledge, especially around IP routing, Ethernet, and datacentre operations. Fluidstack is looking for someone who can demonstrate deep expertise in these areas, so be ready to discuss your hands-on experience and any challenges you've faced.

Show Your Passion for Reliability

Fluidstack values candidates who are genuinely passionate about reliability and observability. Be prepared to share examples of how you've improved network reliability in past roles, and discuss any processes or tools you've developed to enhance quality assurance.

Collaboration is Key

This role requires working closely with various teams, so highlight your experience in cross-team collaboration. Share specific instances where you've successfully partnered with deployment, operations, or software development teams to achieve a common goal.

Be Ready for Incident Scenarios

Expect questions about incident response and troubleshooting. Prepare to discuss how you've handled outages or complex failures under pressure, including your approach to communication and escalation during such incidents.