At a Glance
- Tasks: Ensure platform reliability and automate operational tasks for a shared PaaS.
- Company: Join a dynamic team focused on delivering secure digital services.
- Benefits: Competitive daily rate, flexible work environment, and opportunities for professional growth.
- Why this job: Make a real impact on service reliability while working with cutting-edge technologies.
- Qualifications: Experience in live service operations, automation, and cloud platforms required.
- Other info: Collaborative environment with a focus on continuous improvement and agile practices.
As a Site Reliability Engineer (SRE), you will support the reliability, availability, performance, and security of a shared Platform as a Service (PaaS) used by multiple delivery teams. Operating at SFIA Level 4 (Enable), you will apply established SRE practices to ensure platform stability, automate operational tasks, and improve service resilience. You will work closely with platform engineers, developers, security, and live service teams to support the safe, efficient delivery of digital services in line with DDaT and government standards and in a timely fashion.
Service reliability & operations
- Maintain and improve the availability, reliability, and performance of the PaaS.
- Support live services, including incident response, investigation, and resolution, following agreed runbooks, and escalation paths.
- Participate in on-call rotas and contribute to incident post-incident reviews (PIRs), identifying root causes and improvement actions.
- Monitor platform health using logs, metrics, and alerts, proactively identifying, and resolving issues.
Automation & continuous improvement
- Automate repeatable operational tasks to reduce toil and improve platform reliability.
- Contribute to infrastructure and configuration management using Infrastructure as Code (IaC) approaches.
- Support continuous improvement of operational processes, reliability patterns, and resilience practices.
Platform support & collaboration
- Support development teams consuming the PaaS, helping them adopt platform standards and reliability best practices.
- Work with security and compliance teams to ensure the platform meets government security, resilience, and audit requirements (JSP453).
- Contribute to platform documentation, runbooks, and knowledge sharing.
- Collaborate within multidisciplinary teams using agile and DevOps practices.
Change & Release Support
- Support safe deployment and release processes, including monitoring changes in live environments.
- Assist with capacity planning and performance testing activities.
- Ensure changes are implemented in line with change management and live service standards.
Skills
- Live service operations & incident management experience.
- Strong automation & scripting capability.
- K8 & Cloud compute platform (e.g. AWS) experience.
- Experience supporting live digital services in a production environment.
- Practical knowledge of cloud platforms and PaaS concepts (e.g. managed computer, networking, storage, CI/CD).
- Experience with container platforms (e.g. Kubernetes) or managed PaaS offerings.
- Experience with monitoring, logging, and alerting tools (e.g. Prometheus, Grafana, Elastic).
- Ability to diagnose and resolve technical issues using established processes and tooling.
- Experience writing scripts or automation using languages such as Python, Bash, or similar.
- Understanding of reliability engineering concepts, including incident management, resilience, and failure modes.
- Ability to work independently on defined tasks and contribute effectively within a team.
- Experience using Infrastructure as Code tools (e.g. Terraform, CloudFormation).
Nice to have skills
- Experience working in a government or regulated/secure environment.
- Familiarity with SRE practices such as error budgets and blameless post-incident reviews.
- Knowledge of security and compliance controls relevant to live services.
- Experience using Jira and wider Atlassian project suite (e.g. Confluence).
Site Reliability Engineer in Portsmouth employer: Trust In SODA
Contact Detail:
Trust In SODA Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Site Reliability Engineer in Portsmouth
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with other SREs on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your automation scripts, IaC projects, or any cool stuff you've built. This gives potential employers a taste of what you can do beyond just a CV.
✨Tip Number 3
Prepare for those interviews! Brush up on your incident management scenarios and be ready to discuss how you've improved service reliability in past roles. Practising common SRE interview questions can really help you stand out.
✨Tip Number 4
Don't forget to apply through our website! We’ve got loads of opportunities that might be perfect for you. Plus, applying directly shows your enthusiasm and commitment to joining our team.
We think you need these skills to ace Site Reliability Engineer in Portsmouth
Some tips for your application 🫡
Tailor Your CV: Make sure your CV is tailored to the Site Reliability Engineer role. Highlight your experience with live service operations, automation, and any relevant cloud platforms like AWS. We want to see how your skills match what we're looking for!
Craft a Compelling Cover Letter: Your cover letter is your chance to shine! Use it to explain why you're passionate about SRE and how your background makes you a great fit for our team. Don't forget to mention your experience with incident management and resilience practices.
Showcase Your Technical Skills: When listing your technical skills, be specific! Mention your experience with scripting languages like Python or Bash, and any tools you've used for monitoring and logging. We love seeing practical examples of how you've applied these skills in real-world scenarios.
Apply Through Our Website: We encourage you to apply through our website for a smoother application process. It helps us keep track of your application and ensures you don’t miss out on any important updates. Plus, we can't wait to hear from you!
How to prepare for a job interview at Trust In SODA
✨Know Your SRE Practices
Make sure you brush up on established Site Reliability Engineering practices. Be ready to discuss how you've applied these in past roles, especially around incident management and resilience. This will show that you understand the core responsibilities of the role.
✨Demonstrate Automation Skills
Prepare to showcase your automation and scripting capabilities. Bring examples of scripts you've written or operational tasks you've automated, particularly using languages like Python or Bash. This is crucial for reducing toil and improving platform reliability.
✨Familiarise with Cloud Platforms
Since the role involves working with cloud compute platforms like AWS, make sure you're comfortable discussing your experience with them. Be ready to talk about any projects where you've used Kubernetes or other container platforms, as well as monitoring tools like Prometheus or Grafana.
✨Collaboration is Key
Highlight your experience working in multidisciplinary teams and using agile and DevOps practices. Be prepared to discuss how you've collaborated with developers, security teams, and others to ensure service reliability and compliance with government standards.