At a Glance
- Tasks: Diagnose and resolve complex network issues while maintaining operational reliability in a cloud environment.
- Company: Join the UK Government's tech team, making a difference in public service.
- Benefits: Competitive salary, job security, and opportunities for professional growth.
- Why this job: Be at the forefront of technology, ensuring critical services run smoothly for the nation.
- Qualifications: Strong Linux skills, experience with cloud networks, and scripting knowledge required.
- Other info: Dynamic role with a focus on security, scalability, and performance in a 24/7 environment.
The predicted salary is between 48000 - 72000 £ per year.
UK Government role — candidates must be eligible for security clearance. We are seeking an experienced Site Reliability Engineer with strong Linux troubleshooting skills and deep knowledge of virtual cloud networks and access technologies. The ideal candidate will have proven experience resolving complex issues across large-scale network infrastructure and cloud services in real time.
Responsibilities include diagnosing and resolving production incidents, writing Python and Bash scripts on the fly to support live troubleshooting and automation, and maintaining operational reliability across cloud networking environments. Candidates should have hands-on expertise with remote access technologies such as FastConnect, IPsec, and BGP for secure and scalable route distribution. A strong understanding of Linux system processes, memory utilisation, disk and log management, network functionality, containerisation, and the TCP/IP stack is essential.
The role involves triaging and resolving Severity 1 and 2 incidents using logs, metrics, and CLI tools under pressure, including failed changes or system and process failures that directly impact customers in a 24/7 operational environment.
- Work with the Virtual Networking team to share full-stack ownership of a collection of services and technology areas, providing operational support as part of an on-call rotation.
- Understand the end-to-end configuration, technical dependencies, and overall behavioural characteristics of production services.
- Take responsibility for the delivery of the mission-critical stack with a strong focus on security, resiliency, scalability, and performance.
- Hold authority for end-to-end performance and operability.
- Partner with global development teams to define and implement improvements in service architecture.
- Clearly articulate the technical characteristics of services and technology areas, guiding development teams to engineer and deliver premier capabilities within the Oracle Cloud service portfolio.
- Develop and communicate a clear understanding of the scale, capacity, security, and performance attributes and requirements of the service and technology stack.
- Demonstrate a solid grasp of automation and orchestration principles.
- Act as the ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs).
- Apply a deep understanding of service topologies and their dependencies to troubleshoot issues and define mitigations.
- Understand and explain the impact of product architecture decisions on distributed systems.
- Exhibit professional curiosity and a desire to develop a deep technical understanding of services and technologies.
- Ensure high quality, accurate and timely technical documentation of incidents, problems, changes, and standard operating procedures is maintained using tools such as Jira and Confluence.
Work is non-routine and highly complex, involving the application of advanced technical and business skills within the Virtual Networking specialisation of Oracle Cloud Infrastructure (OCI).
Qualifications
- Strong understanding of virtual network architecture, security, and automation.
- Understanding of TCP/IP stack and routing concepts in Linux systems and networking environments, specifically IPSEC, VPNs, and BGP.
- Experience with containerisation technologies and orchestration platforms.
- Solid understanding of Virtual Cloud Networks (VCNs) in public cloud environments.
- Experience with CI/CD systems and release automation tools.
- Experience in scripting languages such as Python or Shell.
- Familiarity with infrastructure automation tools such as Terraform and Chef.
- Possess leadership experience to ensure appropriate changes, upgrades, and enhancements are made based on the technical analysis.
- Must support network segmentation (e.g., security lists, network security groups, or firewalls).
- Deep understanding of manipulating telemetry data (traffic flows, health status) using Grafana dashboards and MQL.
- Experience with major public cloud providers (e.g., Oracle Cloud Infrastructure OCI, or equivalent).
- Experience using Jira and Confluence for incident tracking, knowledge management, and ongoing technical documentation.
Senior Site Reliability Developer (UKGOV) in Stoke-on-Trent employer: Oracle
Contact Detail:
Oracle Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Senior Site Reliability Developer (UKGOV) in Stoke-on-Trent
✨Tip Number 1
Network with professionals in the field! Join online forums, attend meetups, or connect with folks on LinkedIn. Engaging with others can lead to insider info about job openings and even referrals.
✨Tip Number 2
Show off your skills in real-time! Consider setting up a GitHub repository where you can showcase your Python and Bash scripts. This gives potential employers a taste of what you can do and how you tackle problems.
✨Tip Number 3
Prepare for technical interviews by practising common troubleshooting scenarios. Brush up on your Linux skills and be ready to demonstrate your knowledge of virtual cloud networks and access technologies under pressure.
✨Tip Number 4
Don’t forget to apply through our website! We’ve got loads of opportunities that might just be the perfect fit for you. Plus, it’s a great way to ensure your application gets seen by the right people.
We think you need these skills to ace Senior Site Reliability Developer (UKGOV) in Stoke-on-Trent
Some tips for your application 🫡
Tailor Your CV: Make sure your CV is tailored to the role of Senior Site Reliability Developer. Highlight your experience with Linux troubleshooting, cloud networking, and any relevant scripting skills. We want to see how your background aligns with the job description!
Showcase Your Skills: In your application, don’t just list your skills—show us how you've used them in real-world scenarios. Talk about specific incidents you’ve resolved or projects where you’ve implemented automation. This helps us see your practical experience!
Be Clear and Concise: When writing your application, keep it clear and to the point. Use bullet points for easy reading and make sure to explain your technical expertise without jargon overload. We appreciate straightforward communication!
Apply Through Our Website: We encourage you to apply through our website for a smoother process. It’s the best way for us to receive your application and ensures you’re considered for the role. Don’t miss out on this opportunity!
How to prepare for a job interview at Oracle
✨Know Your Tech Inside Out
Make sure you brush up on your Linux troubleshooting skills and get familiar with virtual cloud networks. Be ready to discuss specific incidents you've resolved, especially those involving IPsec, BGP, and containerisation technologies. The more examples you can provide, the better!
✨Showcase Your Scripting Skills
Since you'll need to write Python and Bash scripts on the fly, practice some common scripting tasks before the interview. Bring along examples of scripts you've written for automation or troubleshooting, and be prepared to explain your thought process behind them.
✨Demonstrate Your Problem-Solving Ability
Prepare to discuss how you've triaged and resolved Severity 1 and 2 incidents in high-pressure situations. Think about specific challenges you've faced and how you used logs, metrics, and CLI tools to diagnose issues quickly. This will show your ability to handle real-time problems effectively.
✨Understand the Bigger Picture
Familiarise yourself with the end-to-end configuration and dependencies of production services. Be ready to articulate how architectural decisions impact distributed systems and operational reliability. This shows that you’re not just a techie but also understand the business implications of your work.