At a Glance
- Tasks: Join our team to design and implement automated systems for reliability and scalability.
- Company: Be part of a top-tier investment management house in Central London.
- Benefits: Enjoy a competitive salary, guaranteed bonus, hybrid work, and a generous pension plan.
- Why this job: Make an impact by enhancing system reliability while collaborating with talented professionals.
- Qualifications: Strong experience with AWS, monitoring tools, and automation; excellent communication skills required.
- Other info: Opportunity for mentorship and continuous improvement in a dynamic environment.
The predicted salary is between 96000 - 104000 £ per year.
My client is a top-tier investment management house based in St Pauls.
The Technology Engineering team is looking for an experienced Site Reliability Engineer to join them as they are reimagining production application and infrastructure management. The team is responsible for engineering scalable and resilient hybrid cloud solutions (both AWS and On-prem). You will be responsible for creating tooling and software that monitors and improves the reliability of our systems. In this role, you will research problems, evaluate modern technologies, create prototypes, develop integrated processes, automation, define standards observability tooling, and provide SRE consulting on complex projects.
Requires specialized in-depth knowledge and expertise in your own job discipline, Amazon Web Services (AWS) platform and/or other cloud-based platforms and deep experience in integrating related disciplinary knowledge. Works independently, receives minimal guidance. Accountable for work of yourself and others; sets standards around which others will operate. Proactively identifies problems and can present and implement solutions to these problems.
Role summary and job responsibilities:- Design and implement highly automated systems/services that ensure the availability, reliability, and scalability of infrastructure and applications.
- Build and maintain monitoring and alerting to provide timely feedback on the performance and health of systems, network, and applications.
- Design and implement automation tools to reduce manual toil, streamline repetitive tasks, and enhance overall operational efficiency.
- Design and build Service Level Indicator (SLIs) metrics, including but not limited to Service Level Objectives (SLOs), Error Budget, Burn Rate Alerts.
- Work closely with development teams to embed reliability best practices into the software development process.
- Provide mentorship and training to cross-functional teams on SRE principles, encouraging a shared responsibility for the reliability of services.
- Collaborate with support, operations, and engineering teams to investigate and troubleshoot complex problems.
- Observe and monitor systems to ensure insight into system performance, health, availability, and internal system happenings.
- Understand what to monitor based on the system(s) you are managing, how the monitoring data is stored, and how to analyze the data for future actions.
- Participate in continuous improvement efforts that span multiple multi-functional domains and inform the generation of new standards.
- Be a part of an on-call rotation, continuously enhance automation & documentation, and mentor others on the standard methodologies of infrastructure automation to encourage adoption.
- Able to overcome differences of opinion and drive team alignment around a specific goal or solution.
- Holds associates and teams accountable for adhering to practices and policies.
- Demonstrates deep knowledge of products/flows within supported businesses.
- Decomposes the most complex problems into discrete work units.
- Identifies non-obvious relationships and anomalies often overlooked by others.
- Balances strategic and pragmatic concerns when solving problems.
- Makes sound decisions with limited facts or resources.
- Makes decisions that are cognizant of the firm’s broader business strategy.
- Articulates broader business concerns and/or regulatory landscape, including key risks and controls (e.g., GDPR, MIFID, SOX).
- Strong experience with Monitoring and Alerting tools such as Prometheus, Grafana, New Relic.
- Experience in container orchestration solutions in AWS with ECS, Fargate.
- Skilled in building and maintaining dashboards using tools like Grafana, Prometheus, and Statsd to provide critical insights.
- Worked with Service Reliability Engineering team to design SLI and SLO for respective applications.
- Strong experience with AWS cloud infrastructure and container orchestration operating in a GitOps framework.
- A solid core foundation in infrastructure and systems engineering including Unix/Linux compute, networking, storage, and monitoring stacks.
- Experience using automation tools such as Terraform, Ansible.
- Excellent written and oral communication skills.
- Strong interpersonal skills, adaptable and able to learn quickly.
- Off-hour implementations are required.
- Ability to build positive working relationships with business contacts, within our IT team, and other IT departments.
- Ability to identify tasks and help develop project plans for medium and large-scale projects.
If you are interested, please send your CV for immediate consideration.
Senior Site Reliability Engineer. Investment Management. £120,000 - £130,000 + 15% Guaranteed B[...] employer: CommuniTech Recruitment Group
Contact Detail:
CommuniTech Recruitment Group Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Senior Site Reliability Engineer. Investment Management. £120,000 - £130,000 + 15% Guaranteed B[...]
✨Tip Number 1
Familiarise yourself with the specific tools mentioned in the job description, such as Prometheus, Grafana, and AWS services. Having hands-on experience or projects showcasing your skills with these technologies can set you apart during discussions.
✨Tip Number 2
Network with current or former employees of the investment management house. Engaging with them on platforms like LinkedIn can provide you with insider knowledge about the company culture and expectations, which can be invaluable during interviews.
✨Tip Number 3
Prepare to discuss your experience with automation tools like Terraform and Ansible. Be ready to share specific examples of how you've used these tools to improve operational efficiency, as this aligns closely with the responsibilities of the role.
✨Tip Number 4
Demonstrate your understanding of SRE principles by preparing to discuss how you've implemented SLIs and SLOs in past projects. This will show your ability to embed reliability best practices into software development processes, a key aspect of the role.
We think you need these skills to ace Senior Site Reliability Engineer. Investment Management. £120,000 - £130,000 + 15% Guaranteed B[...]
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights relevant experience in Site Reliability Engineering, particularly with AWS and container orchestration. Use specific examples of projects where you've implemented monitoring tools or automated processes.
Craft a Compelling Cover Letter: In your cover letter, express your passion for reliability engineering and how your skills align with the company's goals. Mention your experience with tools like Prometheus and Grafana, and how you can contribute to their hybrid cloud solutions.
Showcase Problem-Solving Skills: Provide examples in your application that demonstrate your ability to identify and solve complex problems. Highlight any experience you have with creating SLIs and SLOs, as well as your approach to continuous improvement.
Highlight Communication Skills: Since the role involves collaboration with various teams, emphasise your communication skills. Mention any experience you have in mentoring others or working cross-functionally, as this will show your ability to foster a shared responsibility for system reliability.
How to prepare for a job interview at CommuniTech Recruitment Group
✨Showcase Your Technical Expertise
Be prepared to discuss your experience with monitoring and alerting tools like Prometheus and Grafana. Highlight specific projects where you've implemented these technologies, as well as your familiarity with AWS and container orchestration solutions.
✨Demonstrate Problem-Solving Skills
Expect to be asked about complex problems you've encountered in previous roles. Prepare examples that illustrate how you identified issues, evaluated solutions, and implemented effective changes, especially in a hybrid cloud environment.
✨Emphasise Collaboration and Mentorship
Since the role involves working closely with development teams and mentoring others, be ready to share experiences where you've fostered collaboration or provided training on SRE principles. This will show your ability to work within a team and enhance overall reliability.
✨Understand Business Context
Familiarise yourself with the investment management sector and the specific challenges it faces regarding technology and compliance. Being able to articulate how your technical skills align with the business strategy will set you apart from other candidates.