We’re seeking a future team member for the role of Vice President - Site Reliability Engineer to join our team. This role is located in London.
Role Summary
BNY is seeking a Vice President - Site Reliability Engineer to design, build, deploy, and scale resilient, automated, and centrally managed engineering solutions for Production Services. This role is ideal for a strong full-stack engineer who combines application development, UI engineering, backend services, infrastructure automation, and production reliability expertise.
The successful candidate will build reusable platforms, internal tools, and automation capabilities that improve operational efficiency, reduce manual effort, strengthen resiliency, and enable Production Services teams to support critical business platforms more effectively. This role requires a hands-on engineer who can take solutions from concept and development through deployment, operationalization, and continuous improvement.
In this role, you’ll make an impact in the following ways:
Design, develop, and deploy centralized engineering solutions that improve operational efficiency, reduce toil, and enhance resiliency across Production Services.
Build full-stack applications and internal engineering tools, including backend services, APIs, automation layers, and user-facing interfaces using technologies such as Python, Java, React, or Angular.
Engineer scalable solutions that support central operational use cases such as self-service tooling, operational dashboards, alert enrichment, incident reduction, service recovery, and workflow automation.
Develop reusable frameworks and components that can be adopted broadly across Production Services teams to standardize and accelerate operational processes.
Automate infrastructure, deployment, configuration, and runtime support activities using tools such as Ansible and Kubernetes.
Define, implement, and continuously improve Service Level Indicators, Service Level Objectives, and service health measures aligned to operational and business priorities.
Build and optimize monitoring, observability, and alerting capabilities using tools such as Prometheus, Grafana, AppDynamics, and Splunk.
Apply AIOps capabilities to improve event correlation, anomaly detection, root cause analysis, predictive insights, and proactive issue prevention.
Partner with engineering, infrastructure, production support, security, and risk teams to ensure developed solutions are secure, scalable, supportable, and aligned to enterprise standards.
Identify manual, fragmented, or repetitive processes across Production Services and convert them into efficient, automated, centrally consumable solutions.
To be successful in this role, we’re seeking the following:
Required Qualifications:
Bachelor degree in Computer Science, Engineering, or a related technical discipline, or equivalent practical experience.
Strong full-stack development experience, with hands-on expertise in Python and Java for backend or service-layer engineering.
Strong working knowledge of front-end development using React or Angular, including building interfaces for operational or engineering use cases.
Proven experience designing and deploying end-to-end solutions, from application development through production deployment and operational support.
Experience in Site Reliability Engineering, Production Engineering, DevOps, Platform Engineering, or similar roles supporting business-critical applications.
Strong foundation in Linux/Unix systems administration, scripting, troubleshooting, and infrastructure concepts.
Hands-on experience with Ansible and Kubernetes in enterprise or production environments.
Demonstrated ability to define and operationalize SLIs, SLOs, dashboards, alerts, and health indicators.
Hands-on experience with enterprise monitoring and observability platforms including Prometheus, Grafana, AppDynamics, and Splunk.
Strong troubleshooting, analytical, and problem-solving skills in complex distributed or production environments.
Strong verbal and written communication skills, with the ability to collaborate effectively across technical and non-technical stakeholders.
Preferred Qualifications
Experience building centralized internal platforms or shared engineering services for operational or enterprise users.
Experience applying AIOps, machine learning, or intelligent automation within production support or reliability engineering environments.
Exposure to CI/CD pipelines, infrastructure as code, API-driven automation, and modern software delivery practices.
Experience supporting distributed systems, cloud-native platforms, or container-based architectures.
Knowledge of Agile, DevOps, and SRE operating models, including continuous improvement and blameless post-incident practices.
Ability to influence engineering standards and drive adoption of common tooling and automation patterns across teams.