Senior Site Reliability Engineer Remote, Contract in City of London
Senior Site Reliability Engineer Remote, Contract

Senior Site Reliability Engineer Remote, Contract in City of London

City of London Full-Time 70000 - 90000 £ / year (est.) No home office possible
Realm

At a Glance

  • Tasks: Ensure reliability and performance of a global compute platform while resolving complex production issues.
  • Company: Join a high-growth infrastructure company at the forefront of advanced machine learning solutions.
  • Benefits: Enjoy competitive salary, equity, health coverage, and generous paid time off.
  • Other info: Collaborate closely with teams to design and operate high-demand computational systems.
  • Why this job: Make a real impact in a fast-paced environment with ownership and execution speed.
  • Qualifications: 5+ years in site reliability engineering or DevOps, with strong systems expertise.

The predicted salary is between 70000 - 90000 £ per year.

High-growth infrastructure company focused on delivering large-scale compute, data centre capacity, and power solutions for advanced machine learning workloads. Platforms support leading research and industry teams requiring high-performance computing at significant scale. Fast-paced environment with emphasis on ownership, execution speed, and quality.

Position operating across software, infrastructure, and operations to ensure reliability, scalability, and performance of a globally distributed compute platform. Close collaboration with networking, platform engineering, and physical infrastructure teams to design and operate systems supporting high-demand computational workloads.

Hands-on engineering role requiring strong systems expertise, with responsibility for resolving complex production issues, improving system resilience, and enhancing platform observability.

  • Deployment and management of large-scale compute clusters using automation tooling, with adaptation to customer requirements.
  • Validation and optimisation of compute, storage, and networking systems in coordination with internal teams and vendors.
  • Execution of large-scale data migrations between cloud and on-premise environments with focus on efficiency and cost.
  • Troubleshooting across the full stack, including hardware, networking, and distributed systems.
  • Development of internal tooling and automation to improve deployment speed, reliability, and operational efficiency.

Experience building maintainable, well-documented systems in complex environments.

  • 5+ years of experience in site reliability engineering, DevOps, systems administration, or high-performance computing.
  • Strong written and verbal communication skills in English.
  • Programming or scripting experience in Go, Python, or Bash.
  • Strong technical foundation in computing or related discipline.
  • Experience operating large-scale machine learning or AI-compute workloads.
  • Hands-on experience with data centre or bare-metal infrastructure.
  • Knowledge of high-performance networking technologies.
  • Experience managing large-scale storage systems (commercial or open-source).

Compensation & Benefits:

  • Competitive salary and equity package.
  • Retirement or pension contributions aligned with local standards.
  • Health coverage including medical, dental, and vision.
  • Generous paid time off policy.

Senior Site Reliability Engineer Remote, Contract in City of London employer: Realm

Join a high-growth infrastructure company that prioritises innovation and collaboration, offering a dynamic work culture where your contributions directly impact the success of advanced machine learning workloads. With competitive salaries, generous benefits including health coverage and retirement contributions, and ample opportunities for professional growth, this remote role as a Senior Site Reliability Engineer allows you to thrive in a fast-paced environment while working alongside leading experts in the field.
Realm

Contact Detail:

Realm Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Senior Site Reliability Engineer Remote, Contract in City of London

✨Tip Number 1

Network like a pro! Reach out to your connections in the industry, attend meetups, and engage in online forums. The more people you know, the better your chances of landing that Senior Site Reliability Engineer role.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to high-performance computing or automation tooling. This will give potential employers a taste of what you can bring to the table.

✨Tip Number 3

Prepare for technical interviews by brushing up on your systems expertise and troubleshooting skills. Practice common scenarios you might face in a fast-paced environment, and be ready to demonstrate your problem-solving abilities.

✨Tip Number 4

Don’t forget to apply through our website! We’re always on the lookout for talented individuals who can thrive in a collaborative environment. Your next big opportunity could be just a click away!

We think you need these skills to ace Senior Site Reliability Engineer Remote, Contract in City of London

Site Reliability Engineering
DevOps
Systems Administration
High-Performance Computing
Automation Tooling
Cloud and On-Premise Data Migrations
Troubleshooting
Programming in Go
Python
Bash
Data Centre Infrastructure
High-Performance Networking Technologies
Large-Scale Storage Systems Management
System Resilience
Platform Observability

Some tips for your application 🫡

Tailor Your CV: Make sure your CV highlights your experience in site reliability engineering and any relevant projects you've worked on. We want to see how your skills align with our focus on high-performance computing and large-scale systems.

Showcase Your Technical Skills: Don’t forget to mention your programming or scripting experience, especially in Go, Python, or Bash. We’re looking for someone who can hit the ground running, so let us know how you’ve used these skills in past roles.

Be Clear and Concise: When writing your cover letter, keep it straightforward. We appreciate clarity and directness, so get to the point about why you’re a great fit for this role and how you can contribute to our fast-paced environment.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you don’t miss out on any important updates during the process!

How to prepare for a job interview at Realm

✨Know Your Tech Inside Out

Make sure you brush up on your technical skills, especially in areas like Go, Python, and Bash. Be ready to discuss your experience with large-scale compute clusters and how you've tackled complex production issues in the past.

✨Showcase Your Problem-Solving Skills

Prepare examples of how you've resolved challenging problems in high-performance computing environments. Highlight your hands-on experience with troubleshooting across hardware, networking, and distributed systems to demonstrate your expertise.

✨Collaboration is Key

Since this role involves close collaboration with various teams, be prepared to discuss how you've worked with networking, platform engineering, and infrastructure teams in the past. Share specific instances where teamwork led to successful project outcomes.

✨Be Ready for Scenario Questions

Expect scenario-based questions that test your ability to manage large-scale data migrations or improve system resilience. Think through your approach to these challenges and be ready to articulate your thought process clearly.

Senior Site Reliability Engineer Remote, Contract in City of London
Realm
Location: City of London

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>