SRE (Linux, Firmware & Server Infrastructure)

Job Board

Companies

Networking People Limited

SRE (Linux, Firmware & Server Infrastructure)

Temporary Home office (partial)

Apply Now

At a Glance

Tasks: Manage and resolve complex platform and hardware incidents in a high-performing team.
Company: Dynamic tech company based in Glasgow with a hybrid work culture.
Benefits: Competitive day rate, flexible working, and opportunities for professional growth.
Other info: Collaborative environment with excellent career advancement opportunities.
Why this job: Join a cutting-edge team and make a real impact on critical infrastructure reliability.
Qualifications: Strong Linux expertise and experience in server hardware and firmware management.

We are seeking a Senior Platform Reliability Engineer with deep Linux systems expertise and strong exposure to server hardware, firmware, and low-level infrastructure operations. This role sits within a high-performing enterprise infrastructure team responsible for maintaining and improving the reliability of critical platforms at scale. The position is heavily focused on resolving complex platform and hardware-related incidents, particularly those escalated from L3 support, with an emphasis on firmware lifecycle management, disk encryption, logging, and server configuration (BIOS-level controls) across multi-vendor environments. This is a hands-off hardware role, requiring strong remote troubleshooting capabilities, excellent communication skills, and the ability to work closely with internal teams and external vendors to drive issues through to resolution.

Key Responsibilities

Own and manage end-to-end incident resolution for platform and hardware-related issues, including triage, mitigation, escalation, and post-incident review
Diagnose and troubleshoot Linux OS-level issues arising from hardware faults, firmware changes, or configuration inconsistencies
Manage and support firmware lifecycle processes, including upgrades, validation, and issue remediation
Work with disk encryption technologies and logging frameworks, ensuring system integrity and auditability
Maintain and troubleshoot server configuration settings, including BIOS-level parameters across multiple hardware vendors (strong Dell focus)
Utilize out-of-band management tools (e.g., iDRAC, iLO, RACADM, Redfish APIs) for remote diagnostics and recovery
Analyse vendor logs, support bundles, and telemetry data to identify root causes and remediation paths
Engage directly with hardware vendors and engineering teams, managing escalations and driving timely resolutions
Contribute to continuous improvement initiatives, reducing incident recurrence and operational toil
Produce and maintain high-quality documentation, including runbooks, troubleshooting guides, and knowledge base articles
Participate in post-incident reviews (RCA) and support improvements in reliability metrics (MTTR, MTTD, SLOs)

Essential Skills & Experience

Strong Linux administration and troubleshooting expertise, including process and service management, system logs and diagnostics, networking fundamentals, and package and configuration management
Solid understanding of server hardware and infrastructure, including disks, RAID/HBA controllers, NICs and firmware interactions, and hardware failure modes and OS-level symptoms
Proven experience with firmware management and upgrades, disk encryption and secure configurations, and BIOS/server configuration management
Hands-on experience with remote management and lights-out technologies, such as iDRAC, iLO, RACADM, Redfish or similar APIs
Strong track record of incident ownership, including triage and mitigation, cross-team coordination, stakeholder communication, and driving issues through to resolution
Experience working with vendor diagnostics, logs, and support bundles, as well as vendor escalation processes and engineering engagement
Excellent communication skills (written and verbal), with the ability to clearly articulate technical issues to both technical and non-technical stakeholders
Strong documentation skills, including creation of runbooks, procedures, and RCA reports

Desirable Skills

Scripting and automation experience (e.g., Python, Bash, Ansible)
Familiarity with configuration management and automation frameworks
Exposure to virtualisation and containerisation technologies (VMware, KVM, Docker, Kubernetes)
Experience with monitoring, observability, and alerting systems, including log analysis and alert tuning
Understanding of SRE principles and metrics, including SLOs, SLIs, error budgets, MTTR/MTTD

Key Attributes

Methodical and detail-oriented approach to troubleshooting
Strong sense of ownership and accountability
Comfortable working in high-pressure, incident-driven environments
Collaborative mindset with the ability to work across global teams and vendors
Proactive approach to continuous improvement and operational excellence

SRE (Linux, Firmware & Server Infrastructure) employer: Networking People Limited

Join a dynamic and innovative team in Glasgow as a Senior Platform Reliability Engineer, where you will have the opportunity to work on critical infrastructure projects in a hybrid environment. Our company fosters a collaborative culture that values continuous improvement and professional growth, offering competitive day rates and the chance to engage with cutting-edge technologies. With a strong focus on employee development and a commitment to operational excellence, we provide a rewarding workplace for those looking to make a meaningful impact.

Contact Detail:

Networking People Limited Recruiting Team

View Networking People Limited Profile

StudySmarter Expert Advice🤫

We think this is how you could land SRE (Linux, Firmware & Server Infrastructure)

✨Tip Number 1

Get your networking game on! Reach out to folks in the industry, especially those already working as SREs. LinkedIn is a goldmine for this – connect, engage, and don’t be shy to ask for advice or insights about their experiences.

✨Tip Number 2

Show off your skills! If you’ve got experience with Linux, firmware, or server infrastructure, make sure to highlight that in conversations. Share specific examples of how you’ve tackled complex issues or improved system reliability in past roles.

✨Tip Number 3

Practice makes perfect! Prepare for technical interviews by brushing up on your troubleshooting skills. Think through common scenarios you might face as an SRE and how you’d resolve them. This will help you feel more confident when it’s time to shine.

✨Tip Number 4

Don’t forget to apply through our website! We’re always on the lookout for talented individuals like you. Plus, applying directly can sometimes give you a leg up in the process, so take that step and get your application in!

We think you need these skills to ace SRE (Linux, Firmware & Server Infrastructure)

Linux Administration

Troubleshooting Expertise

Firmware Management

Disk Encryption

Server Configuration Management

Remote Management Tools (iDRAC, iLO, RACADM, Redfish)

Incident Resolution

Cross-Team Coordination

Vendor Diagnostics

Documentation Skills

Scripting (Python, Bash, Ansible)

Configuration Management

Virtualisation Technologies (VMware, KVM, Docker, Kubernetes)

Monitoring and Observability Systems

SRE Principles and Metrics

Some tips for your application 🫡

Tailor Your CV:Make sure your CV highlights your Linux expertise and experience with server hardware. We want to see how your skills match the job description, so don’t be shy about showcasing relevant projects or roles you've had.

Craft a Compelling Cover Letter:Your cover letter is your chance to shine! Use it to explain why you’re the perfect fit for the Senior Platform Reliability Engineer role. Share specific examples of how you've tackled complex incidents or improved platform reliability in the past.

Show Off Your Communication Skills:Since this role involves working with both technical and non-technical teams, make sure your application reflects your ability to communicate clearly. We love candidates who can articulate their thoughts well, so don’t hold back!

Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it’s super easy!

How to prepare for a job interview at Networking People Limited

✨Know Your Linux Inside Out

Make sure you brush up on your Linux administration skills. Be ready to discuss troubleshooting techniques, system logs, and diagnostics. They’ll likely ask you about specific scenarios where you've resolved OS-level issues, so have some examples in mind.

✨Familiarise Yourself with Hardware and Firmware

Since this role involves a lot of hardware interaction, it’s crucial to understand server components like disks, RAID controllers, and BIOS settings. Prepare to talk about your experience with firmware management and how you've handled upgrades or issues in the past.

✨Show Off Your Communication Skills

You’ll need to articulate complex technical issues clearly, so practice explaining your past experiences to someone who isn’t technical. Think about how you’ve communicated with vendors or cross-team members during incidents and be ready to share those stories.

✨Prepare for Incident Management Questions

Expect questions around incident ownership and resolution processes. Have examples ready that showcase your ability to triage, mitigate, and drive issues to resolution. Highlight any continuous improvement initiatives you've contributed to, as they’ll want to see your proactive approach.

SRE (Linux, Firmware & Server Infrastructure)

Networking People Limited

Apply Now

SRE (Linux, Firmware & Server Infrastructure)

At a Glance

SRE (Linux, Firmware & Server Infrastructure) employer: Networking People Limited

StudySmarter Expert Advice🤫

We think you need these skills to ace SRE (Linux, Firmware & Server Infrastructure)

Some tips for your application 🫡

How to prepare for a job interview at Networking People Limited

Company

Product

Help