Senior Site Reliability Engineer Remote, Contract in London

Senior Site Reliability Engineer Remote, Contract in London

London Freelance 70000 - 90000 € / year (est.) Home office possible
Realm

At a Glance

  • Tasks: Ensure reliability and performance of a global compute platform while resolving complex production issues.
  • Company: Join a high-growth infrastructure company at the forefront of advanced machine learning solutions.
  • Benefits: Enjoy competitive salary, equity, health coverage, and generous paid time off.
  • Other info: Collaborate closely with teams to design and operate high-demand computational systems.
  • Why this job: Make a real impact in a fast-paced environment with ownership and execution speed.
  • Qualifications: 5+ years in site reliability engineering or DevOps, with strong systems expertise.

The predicted salary is between 70000 - 90000 € per year.

High-growth infrastructure company focused on delivering large-scale compute, data centre capacity, and power solutions for advanced machine learning workloads. Platforms support leading research and industry teams requiring high-performance computing at significant scale. Fast-paced environment with emphasis on ownership, execution speed, and quality.

Position operating across software, infrastructure, and operations to ensure reliability, scalability, and performance of a globally distributed compute platform. Close collaboration with networking, platform engineering, and physical infrastructure teams to design and operate systems supporting high-demand computational workloads.

Hands-on engineering role requiring strong systems expertise, with responsibility for resolving complex production issues, improving system resilience, and enhancing platform observability.

  • Deployment and management of large-scale compute clusters using automation tooling, with adaptation to customer requirements.
  • Validation and optimisation of compute, storage, and networking systems in coordination with internal teams and vendors.
  • Execution of large-scale data migrations between cloud and on-premise environments with focus on efficiency and cost.
  • Troubleshooting across the full stack, including hardware, networking, and distributed systems.
  • Development of internal tooling and automation to improve deployment speed, reliability, and operational efficiency.

Experience building maintainable, well-documented systems in complex environments.

  • 5+ years of experience in site reliability engineering, DevOps, systems administration, or high-performance computing.
  • Strong written and verbal communication skills in English.
  • Programming or scripting experience in Go, Python, or Bash.
  • Strong technical foundation in computing or related discipline.
  • Experience operating large-scale machine learning or AI-compute workloads.
  • Hands-on experience with data centre or bare-metal infrastructure.
  • Knowledge of high-performance networking technologies.
  • Experience managing large-scale storage systems (commercial or open-source).

Compensation & Benefits:

  • Competitive salary and equity package.
  • Retirement or pension contributions aligned with local standards.
  • Health coverage including medical, dental, and vision.
  • Generous paid time off policy.

Senior Site Reliability Engineer Remote, Contract in London employer: Realm

Join a high-growth infrastructure company that prioritises innovation and collaboration, offering a dynamic work environment where your expertise in site reliability engineering will directly impact the performance of cutting-edge machine learning workloads. With competitive salaries, generous benefits including health coverage and retirement contributions, and a strong emphasis on employee growth and development, this remote role provides an exceptional opportunity to thrive in a fast-paced setting while working alongside industry leaders.

Realm

Contact Detail:

Realm Recruiting Team

StudySmarter Expert Advice🀫

We think this is how you could land Senior Site Reliability Engineer Remote, Contract in London

✨Tip Number 1

Network like a pro! Reach out to folks in the industry, attend meetups, and connect with potential colleagues on LinkedIn. We all know that sometimes it’s not just what you know, but who you know that can land you that dream job.

✨Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those related to site reliability engineering or high-performance computing. We want to see your hands-on experience and how you tackle complex problems.

✨Tip Number 3

Prepare for technical interviews by brushing up on your systems expertise and troubleshooting skills. We recommend practicing common scenarios you might face in a fast-paced environment like ours. The more prepared you are, the more confident you'll feel!

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, we love seeing candidates who take the initiative to engage directly with us.

We think you need these skills to ace Senior Site Reliability Engineer Remote, Contract in London

Site Reliability Engineering
DevOps
Systems Administration
High-Performance Computing
Automation Tooling
Cloud and On-Premise Data Migrations
Troubleshooting

Some tips for your application 🫑

Tailor Your CV:Make sure your CV reflects the skills and experiences that match the job description. Highlight your expertise in site reliability engineering, DevOps, and any relevant programming languages like Go or Python. We want to see how you can contribute to our high-performance computing environment!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you're passionate about the role and how your background aligns with our mission at StudySmarter. Be sure to mention any hands-on experience with large-scale compute clusters or data centre operations.

Showcase Your Problem-Solving Skills:In your application, don’t shy away from sharing examples of complex production issues you've resolved or how you've improved system resilience. We love seeing candidates who can demonstrate their troubleshooting prowess across hardware, networking, and distributed systems.

Apply Through Our Website:We encourage you to apply directly through our website for the best chance of getting noticed. It’s the easiest way for us to keep track of your application and ensure it reaches the right team. Plus, we’re excited to see what you bring to the table!

How to prepare for a job interview at Realm

✨Know Your Tech Inside Out

Make sure you brush up on your technical skills, especially in areas like systems administration, high-performance computing, and the programming languages mentioned in the job description. Be ready to discuss your hands-on experience with large-scale compute clusters and any complex production issues you've resolved.

✨Showcase Your Problem-Solving Skills

Prepare examples of how you've tackled challenging problems in previous roles. Think about specific instances where you improved system resilience or enhanced platform observability. This will demonstrate your ability to think critically and act decisively in a fast-paced environment.

✨Collaboration is Key

Since this role involves close collaboration with various teams, be ready to talk about your experience working with networking, platform engineering, and physical infrastructure teams. Highlight any successful projects where teamwork played a crucial role in achieving results.

✨Ask Insightful Questions

Prepare thoughtful questions about the company's infrastructure, their approach to automation, and how they handle large-scale data migrations. This shows your genuine interest in the role and helps you assess if the company aligns with your career goals.