At a Glance
- Tasks: Join us as a Site Reliability Engineer, blending scripting and hands-on support for system reliability.
- Company: TEKsystems is part of the Allegis Group, a global network of companies focused on technology solutions.
- Benefits: Enjoy hybrid working in London, with opportunities for growth and learning in a dynamic environment.
- Why this job: Be at the forefront of automation and cloud infrastructure, making a real impact on system performance.
- Qualifications: Strong Python scripting skills and experience with AWS, CI/CD, and monitoring tools like Prometheus and Grafana.
- Other info: This is a contract opportunity with a focus on collaboration and innovation in tech.
The predicted salary is between 43200 - 72000 £ per year.
We are looking for a skilled and adaptable Site Reliability Engineer (SRE) to join our team. This role is a blend of scripting and operational responsibilities, ideal for someone who enjoys both building automation and engaging in hands-on support to ensure system reliability and performance.
London hybrid working - Contract Opportunity
Must have's
- Python scripting - They could take someone with Go
- Automation experience
- Prometheus / Grafana / PromQL
- CI/CD
- AWS
- Splunk
Key Responsibilities
- Develop and maintain automation scripts, primarily in Python (Go experience also considered).
- Respond to and resolve incidents, manage changes, and perform problem analysis to maintain system uptime and reliability.
- Collaborate with internal teams and customers to troubleshoot and resolve infrastructure and application issues.
- Operate and enhance observability tooling, including Prometheus, Grafana, and Splunk, with a strong focus on PromQL.
- Participate in an on-call rotation to support critical production systems.
- Improve and maintain CI/CD pipelines and deployment processes.
- Work with AWS cloud infrastructure to support scalable, secure, and resilient systems.
- Operate within a GitOps workflow and support Kubernetes-based environments.
Required Skills & Experience
- Strong scripting skills in Python (Go, Bash, or SQL also beneficial).
- Proven experience with automation and infrastructure-as-code practices.
- Deep understanding of monitoring and observability, particularly with Prometheus, Grafana, and PromQL.
- Experience with CI/CD tools and modern deployment strategies.
- Solid hands-on experience with AWS services in a production environment.
- Proficiency with Splunk for log analysis and monitoring.
- Familiarity with GitHub, GitOps, and Kubernetes operations.
Location: London, UK
Python / Go - Site Reliability Engineer (SRE) employer: TEKsystems
Contact Detail:
TEKsystems Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Python / Go - Site Reliability Engineer (SRE)
✨Tip Number 1
Familiarise yourself with the specific tools mentioned in the job description, such as Prometheus, Grafana, and Splunk. Having hands-on experience or even personal projects showcasing your skills with these tools can set you apart from other candidates.
✨Tip Number 2
Engage with the SRE community online. Join forums, attend webinars, or participate in discussions related to Site Reliability Engineering. This not only helps you learn but also shows your passion for the field when you discuss relevant topics during interviews.
✨Tip Number 3
Prepare to discuss your automation experience in detail. Be ready to share specific examples of how you've implemented automation in previous roles, particularly using Python or Go, as this is a key requirement for the position.
✨Tip Number 4
Understand the principles of CI/CD and be prepared to talk about your experience with deployment processes. Highlight any projects where you've improved CI/CD pipelines, as this will demonstrate your ability to enhance operational efficiency.
We think you need these skills to ace Python / Go - Site Reliability Engineer (SRE)
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights your experience with Python scripting, automation, and any relevant tools like Prometheus, Grafana, and AWS. Use specific examples to demonstrate your skills in these areas.
Craft a Strong Cover Letter: In your cover letter, express your enthusiasm for the Site Reliability Engineer role. Mention how your background aligns with the key responsibilities and required skills listed in the job description.
Showcase Relevant Projects: If you have worked on projects involving CI/CD pipelines, Kubernetes, or infrastructure-as-code, be sure to include these in your application. Detail your contributions and the impact they had on system reliability.
Highlight Problem-Solving Skills: Since the role involves incident response and problem analysis, provide examples of how you've successfully resolved technical issues in the past. This will demonstrate your ability to maintain system uptime and reliability.
How to prepare for a job interview at TEKsystems
✨Showcase Your Scripting Skills
Be prepared to discuss your experience with Python scripting in detail. Bring examples of automation scripts you've developed and be ready to explain the challenges you faced and how you overcame them.
✨Demonstrate Your Automation Experience
Highlight your experience with automation and infrastructure-as-code practices. Discuss specific tools you've used, such as CI/CD pipelines, and how they improved system reliability and performance.
✨Familiarise Yourself with Monitoring Tools
Since the role involves working with Prometheus, Grafana, and Splunk, make sure you understand how these tools work. Be ready to talk about how you've used them to enhance observability and troubleshoot issues.
✨Prepare for Technical Questions
Expect technical questions related to AWS services, GitOps workflows, and Kubernetes operations. Brush up on these topics and think of real-world scenarios where you've applied your knowledge.