Job Board

Companies

Cognizant

Site Reliability Engineer - Azure

Glasgow Full-Time 43200 - 72000 £ / year (est.) No home office possible

At a Glance

Tasks: Lead the adoption of SRE practices and improve system reliability and performance.
Company: Join an inclusive team committed to innovation and professional development.
Benefits: Enjoy significant stakeholder interaction and opportunities for collaboration.
Why this job: Be part of a culture that values innovation, teamwork, and continuous improvement.
Qualifications: Strong knowledge of reliability systems, coding experience, and Azure cloud expertise required.
Other info: Opportunity to coach colleagues and lead improvements in a dynamic environment.

The predicted salary is between 43200 - 72000 £ per year.

In this key role, you will improve, drive, and embed non-functional and operational characteristics such as availability, performance, efficiency, change management, observability, security, incident response, and capacity planning of our products and services.

You will enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to deliver change in a safe and secure way.

This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development.

What you will do

As Site Reliability Engineer, we will look to you to lead the adoption of SRE practices as part of our SRE enablement team. You will work closely with our feature team and other colleagues to meet defined service level objectives and continually improve systems and environments. You will track and reduce toil, define SLIs, SLOs, and define error budgets that support finding the right balance between risk and reliability.

You will also provide structure and help to our release process, suggesting and making improvements where possible. You will scale systems sustainably through mechanisms like automation, evolving them by pushing for changes that improve reliability and velocity. We will also look to you to coach and provide guidance to colleagues and the wider team, leading where required. Proactively contribute innovative ideas and innovations to meet short term and longer-term goals.
Continually balance and manage any potential risks.
Be accountable for the day-to-day health of both production and non-production environments and respond to any incidents as required.
Provide exceptional support to our internal and external customers through proactively managing and pioneering streamlined solutions for internal and external production systems.
Contribute to Site Reliability Operations (Production support, incident response, on-call rota, toil reduction, observability, security, application performance and codification).
Balance feature development speed and reliability with well-defined service level objectives.
Leading and coordinating major incidents in a complex multi-party environment.
Proactively lead improvements to release quality into production and provide highly available, performing, and secure production systems.
Implement proactive monitoring and alerting to ensure proactive response to outages.
Accountable for performance of internal systems and 3rd party supplier performance.
Provide technical expertise and input to establish the risk tolerance of products and services.
Communicate incident status updates clearly and frequently to other teams, customers, and stakeholders.

Key Skills and Experience:

Strong knowledge of reliability systems thinking and experience of software engineering. You will need experience of using a data-driven and scientific approach to fact-finding.
Prior experience in establishing a Site Reliability Engineering function with 24/7 support.
Coding experience and demonstrate how to build, test, scan, and deploy a .NET and JavaScript application.
Hands-on experience of Azure cloud, IaC, JSON, Azure Bicep, Azure policies, Azure DevOps, Open telemetry, Azure Monitoring, Azure Sentinel, Azure Defender, Grafana, Kusto queries, Kubernetes AKS, Azure ARC, Azure function apps.
Excellent knowledge of DevOps, Security, and IT Service Management.
Hands-on experience with Azure Cloud and Full Stack Observability using tools such as Log Analytics, AppInsights.
Deep knowledge of Kubernetes and Prometheus.
Experience on GitOps practices.
Understanding Shift to Right approaches and have experience with chaos engineering.
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
Knowledge of automation of IT request fulfilment process through orchestration, ServiceNow.
Knowledge of cloud-native, microservices including containerization and API Management.
Effective communication and presentation skills.
Financial services knowledge, and the ability to identify wider business impact, risk, and opportunity, and make connections across key outputs and processes.

#J-18808-Ljbffr

Site Reliability Engineer - Azure employer: Cognizant

As a Site Reliability Engineer at our company, you will be part of an inclusive and innovative team that values collaboration and professional growth. We offer a dynamic work culture that encourages continuous learning and the adoption of cutting-edge SRE practices, all while providing competitive benefits and opportunities for career advancement. Located in a vibrant area, our workplace fosters creativity and teamwork, making it an excellent environment for those seeking meaningful and rewarding employment.

Contact Detail:

Cognizant Recruiting Team

View Cognizant Profile

StudySmarter Expert Advice 🤫

We think this is how you could land Site Reliability Engineer - Azure

✨Tip Number 1

Familiarize yourself with Azure services and tools mentioned in the job description, such as Azure DevOps, Kubernetes, and Azure Monitoring. Having hands-on experience with these technologies will not only boost your confidence but also demonstrate your capability to handle the responsibilities of the role.

✨Tip Number 2

Engage with the SRE community through forums, webinars, or local meetups. Networking with professionals in the field can provide you with insights into best practices and current trends, which can be beneficial during interviews.

✨Tip Number 3

Prepare to discuss specific examples of how you've implemented SRE practices in previous roles. Be ready to explain your approach to incident response, monitoring, and automation, as these are key aspects of the position.

✨Tip Number 4

Showcase your problem-solving skills by preparing for scenario-based questions. Think about potential incidents you might encounter in a production environment and how you would address them, emphasizing your proactive approach to risk management.

We think you need these skills to ace Site Reliability Engineer - Azure

Site Reliability Engineering (SRE)

Azure Cloud Services

DevOps Practices

Incident Response Management

Performance Monitoring and Optimization

Automation and Orchestration

Containerization and Kubernetes

JSON and Azure Bicep

Azure DevOps

OpenTelemetry

Grafana and Kusto Queries

Chaos Engineering

GitOps Practices

Effective Communication Skills

Financial Services Knowledge

Some tips for your application 🫡

Understand the Role: Make sure to thoroughly read the job description and understand the key responsibilities and skills required for the Site Reliability Engineer position. Tailor your application to highlight your relevant experience in reliability systems, Azure cloud, and DevOps practices.

Highlight Relevant Experience: In your CV and cover letter, emphasize your hands-on experience with Azure, coding in .NET and JavaScript, and any previous roles where you established SRE functions or provided 24/7 support. Use specific examples to demonstrate your expertise.

Showcase Problem-Solving Skills: Provide examples of how you've proactively identified problems and implemented solutions in past roles. Discuss your experience with incident response, performance bottlenecks, and how you've contributed to improving system reliability.

Communicate Effectively: Since the role involves significant stakeholder interaction, ensure your application reflects strong communication skills. Use clear and concise language in your cover letter and CV, and be prepared to discuss how you communicate incident status updates and collaborate with teams.

How to prepare for a job interview at Cognizant

✨Showcase Your Technical Expertise

Be prepared to discuss your hands-on experience with Azure cloud services and tools like Kubernetes, Prometheus, and Azure DevOps. Highlight specific projects where you implemented SRE practices or improved system reliability.

✨Demonstrate Problem-Solving Skills

Prepare examples of how you've proactively identified and resolved performance bottlenecks or incidents in previous roles. Use a data-driven approach to explain your thought process and the outcomes of your actions.

✨Communicate Effectively

Since this role involves significant stakeholder interaction, practice clear and concise communication. Be ready to explain complex technical concepts in a way that non-technical stakeholders can understand.

✨Emphasize Collaboration and Coaching

Discuss your experience working in teams and how you've coached colleagues in adopting SRE practices. Share examples of how you've contributed to a collaborative team environment and supported others in their professional development.

Site Reliability Engineer - Azure

Glasgow

Full-Time

43200 - 72000 £ / year (est.)

Application deadline: 2027-03-14
Cognizant

View Cognizant Profile

Similar positions in other companies

UK’s top job board for Gen Z

Discover now