Site Reliability Engineer

Site Reliability Engineer

Full-Time 70000 - 90000 £ / year (est.) No working from home possible
Locus Robotics

At a Glance

  • Tasks: Ensure stability and security of our Autonomous Mobile Robots and manage thousands of edge devices globally.
  • Company: Join Locus Robotics, a leader in warehouse automation and IoT solutions.
  • Benefits: Competitive salary, flexible working hours, and opportunities for professional growth.
  • Other info: Dynamic team environment with a focus on innovation and reliability.
  • Why this job: Make a real impact on cutting-edge technology in the robotics and IoT space.
  • Qualifications: Master’s degree and 7+ years in SRE, DevOps, or Systems Engineering required.

The predicted salary is between 70000 - 90000 £ per year.

Requirements

  • Master’s degree in Computer Science, Software Engineering, Systems Engineering, Robotics, or equivalent experience
  • 7+ years of experience: Proven track record in SRE, DevOps, or Systems Engineering with a focus on IoT, remote devices, or distributed edge hardware
  • Deep proficiency in Linux/Unix systems (Debian/Ubuntu preferred), including kernel tuning, shell scripting (Python, Bash), and networking protocols (TCP/IP, MQTT, CoAP, HTTPS/REST, DNS)
  • Knowledge of security best practices for IoT and remote devices, including secure boot, encryption at rest/in transit, and certificate management
  • Expert proficiency in Python, Rust, or Go-based configuration management (Ansible/Terraform) for fleet-wide deployments
  • Strong understanding of SRE principles, including SLIs/SLOs, error budgets, and automation over manual "toil."
  • Experience with enterprise MDM or Unified Endpoint Management (UEM) platforms (such as Jamf Pro, Microsoft Intune, FleetDM, Mosyle, Esper, 42Gears SureMDM, SOTI MobiControl, VMware Workspace ONE, or Headwind MDM)
  • Experience with open-source device management solutions is a plus (such as FleetDM, Mender.io, Balena, Micromdm, Memfault, or RAUC)
  • Experience with building Linux images and containers (with tools such as Yocto, PTXdist, ubuntu-image, Packer, Debian live-build, debootstrap)
  • Experience with Linux packaging formats (such as deb, snap, flatpak, nixpkg)
  • Hands-on experience troubleshooting hardware interfaces, specifically USB/Bluetooth barcode scanners and industrial touchscreen displays
  • Experience configuring and locking down browsers or native apps into dedicated kiosk environments on both Linux and mobile OSs
  • Hands-on experience with cloud infrastructure (AWS or Azure) and containerization technologies like Docker and Kubernetes
  • Experience with CI/CD pipelines tailored for edge device deployment
  • Experience with ROS (Robot Operating System) or managing hardware-in-the-loop systems is a plus
  • Background in warehouse automation, logistics, or industrial IoT

What the job involves

  • Locus Robotics is seeking a Site Reliability Engineer (SRE) with a specialized focus on Remote Device Management. As a core member of our reliability team, you will ensure the stability, security, and scalability of the LocusONE platform supporting our growing fleet of Autonomous Mobile Robots (AMRs), peripherals, and reporting devices.
  • You will bridge the gap between software development and field operations, using Linux expertise and Mobile Device Management (MDM) tools to manage thousands of edge devices globally.
  • Fleet Management at Scale: Design, implement, and maintain robust and secure device management strategies for remote devices using Unified Endpoint Management (UEM), MDM solutions, and orchestration tools.
  • Reliability & Monitoring: Develop and manage observability pipelines to track device health, connectivity, and performance metrics across diverse warehouse environments.
  • OTA & Lifecycle Management: Own the end-to-end lifecycle of device software, including secure Over-the-Air (OTA) firmware updates, rollback strategies, and OS hardening.
  • Incident Response: Participate in on-call rotations to troubleshoot complex system failures, performing root cause analysis (RCA) to drive long-term reliability improvements.
  • Self-Healing Infrastructure: Develop automated remediation scripts that detect and fix common edge issues such as hung scanning processes or display driver freezes without manual intervention.
  • Zero-Touch Scalability: Architect and maintain remote provisioning and management workflows for a global fleet of Linux, iPads, and Android devices using secure remote management strategies.
  • Secure Remote Access: Implement and manage secure remote access protocols such as SSH, VPNs, and private APNs to enable out-of-band troubleshooting and real-time device control without physical site visits.
  • SLO/SLI Frameworks: Define and enforce Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for device availability, connectivity, and peripheral performance.
  • Error Budget Management: Use error budgets to balance the pace of innovation with fleet reliability, ensuring data-driven decisions for feature releases versus stability fixes.
  • Security Governance: Align fleet operations with industry standards such as the NIST Cybersecurity Framework (CSF), ISO/IEC 27001, and CIS Controls.
  • Vulnerability Management: Drive continuous monitoring and automated patching schedules to mitigate risks and ensure regulatory compliance across all managed device platforms.

Site Reliability Engineer employer: Locus Robotics

At Locus Robotics, we pride ourselves on being an exceptional employer that fosters a culture of innovation and collaboration. Our Site Reliability Engineers play a crucial role in ensuring the stability and security of our cutting-edge Autonomous Mobile Robots, with ample opportunities for professional growth and development in a dynamic environment. Located in a vibrant tech hub, we offer competitive benefits, a commitment to work-life balance, and the chance to work with the latest technologies in IoT and remote device management.

Locus Robotics

Contact Details:

Locus Robotics Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Site Reliability Engineer

Tip Number 1

Network, network, network! Get out there and connect with folks in the industry. Attend meetups, webinars, or even online forums related to Site Reliability Engineering. You never know who might have a lead on your dream job!

Tip Number 2

Show off your skills! Create a portfolio or GitHub repository showcasing your projects, especially those involving Linux, Python, or any of the tools mentioned in the job description. This gives potential employers a taste of what you can do.

Tip Number 3

Don’t just apply blindly! Tailor your approach for each application. Research the company and mention specific projects or values that resonate with you. This shows you’re genuinely interested and not just sending out mass applications.

Tip Number 4

Apply through our website! We love seeing candidates who take the initiative to engage directly with us. Plus, it’s a great way to ensure your application gets the attention it deserves.

We think you need these skills to ace Site Reliability Engineer

Linux/Unix Systems
Kernel Tuning
Shell Scripting (Python, Bash)
Networking Protocols (TCP/IP, MQTT, CoAP, HTTPS/REST, DNS)
Security Best Practices for IoT
Configuration Management (Ansible/Terraform)
SRE Principles (SLIs/SLOs, Error Budgets)

Some tips for your application 🫡

Tailor Your CV:Make sure your CV highlights your experience in SRE, DevOps, or Systems Engineering. We want to see how your skills align with our focus on IoT and remote devices, so don’t hold back on showcasing relevant projects!

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you’re passionate about Site Reliability Engineering and how your background makes you a perfect fit for our team at Locus Robotics. Keep it engaging and personal!

Show Off Your Technical Skills:We love seeing hands-on experience! Be sure to mention your proficiency in Linux systems, Python, and any MDM tools you've worked with. Highlight specific projects where you’ve implemented solutions that improved reliability or performance.

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows you’re keen on joining our team!

How to prepare for a job interview at Locus Robotics

Know Your Tech Inside Out

Make sure you’re well-versed in the technologies mentioned in the job description. Brush up on your Linux/Unix skills, especially Debian/Ubuntu, and be ready to discuss kernel tuning and shell scripting in Python or Bash. They’ll likely ask you about networking protocols too, so have some examples ready.

Showcase Your Problem-Solving Skills

Prepare to discuss specific challenges you've faced in previous roles, particularly around incident response and troubleshooting. Think of scenarios where you’ve had to perform root cause analysis or develop automated remediation scripts. Real-life examples will make your answers stand out.

Understand SRE Principles

Familiarise yourself with SRE concepts like SLIs, SLOs, and error budgets. Be prepared to explain how you’ve applied these principles in past projects. This shows that you not only understand the theory but can also implement it effectively in a real-world setting.

Demonstrate Your Passion for IoT

Since this role focuses on remote device management, express your enthusiasm for IoT and edge devices. Share any relevant projects or experiences you’ve had, whether it’s working with MDM solutions or building Linux images. Showing genuine interest can set you apart from other candidates.