Vice President, Platform Engineering - Windows SRE

Vice President, Platform Engineering - Windows SRE in London

London Full-Time No home office possible

Do you want your voice heard and your actions to count? Discover your opportunity with Mitsubishi UFJ Financial Group (MUFG), one of the world\’s leading financial groups. Across the globe, we\’re 150,000 colleagues, striving to make a difference for every client, organization, and community we serve. We stand for our values, building long-term relationships, serving society, and fostering shared and sustainable growth for a better world.

With a vision to be the world\’s most trusted financial group, it\’s part of our culture to put people first, listen to new and diverse ideas and collaborate toward greater innovation, speed and agility. This means investing in talent, technologies, and tools that empower you to own your career.

Join MUFG, where being inspired is expected and making a meaningful impact is rewarded.

Site Reliability Engineering are responsible for delivering continuous improvement, automation and self-service offerings to operational teams across Bank EMEA and Securities International.

Overview

MAIN PURPOSE OF THE ROLE

Responsible for the reliability and efficiency of infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil operations must perform

Member of L3 Engineering team providing subject matter expertise and ultimate escalation

Key Responsibilities

Primary:

Develop software to make infrastructure services self-managing and self-service

Deliver continuous service improvement by developing Infrastructure as Code

Eliminate manual, repetitive, automatable, tactical tasks that are devoid from value

Improve system performance, make effective use of resources, distribute load and reduce latency

Identify SLOs (Service Level Objectives) to meet availability and latency objectives

Develop pro-active monitoring solutions that alert on symptoms and not just on outages

Perform detailed root cause analysis (RCA\’s) on incidents and outages to prevent future

Partner with development teams to improve services via rigorous testing and release procedures

Identify technical debt and partner with application teams to build remediation plans

Develop standard operational procedures and produce effective documentation

Analyse workloads and devise suitable cloud migration strategies where appropriate

Ensure all project / investment workloads are delivered according to plans and budget defined

Liaise with Infrastructure Control and IT Risk teams to satisfy internal and external audit requests

Deputise for team lead when required to do so and act-up accordingly

Identify cost saving and optimisation opportunities across the group

Build strong working relationships across the organisation

Adhere to the core values of the bank

Secondary:

Perform daily health and compliance checks for all systems as required

Ensure all systems are backed up successfully and any issues are promptly resolved

Validate monitoring alerts and batch job failures are detected promptly and satisfactorily resolved

Ensure sufficient capacity is available to accommodate drive growth

Respond to emails sent to the team distribution list / mailboxes in a timely manner

Handle incidents and requests with efficiency and a \”customer first\” mindset

Maintain infrastructure in a highly available, reliable, secure and performant manner

General Server / Database / Virtualisation Administration maintenance activities

Provide technical support to application support and development teams

Provide consultancy to application support and development teams

Take part in On-Call & weekend work rotation; triaging and addressing production issues as they arise

Skills And Experience

Essential:

Exceptional skills in Microsoft Windows Server internals and related technologies

Excellent skills in managing and maintaining Active Directory, DHCP, DNS, LDAP and Kerberos

Extensive experience in hardware performance monitoring and tuning complex low latency systems.

Agile, Site Reliability Engineering (SRE) and DevOps Principles and practices

Exceptional knowledge of scripting and programming languages such as PowerShell, Python and C#

Fluent in Backup and Recovery processes and procedures

Advanced knowledge of Clustering, High-Availability, Replication and Disaster Recovery techniques

Ability to tune Network, Storage, Server and Virtualisation layers for optimal performance and reliability

Excellent Performance Tuning skills, in-depth knowledge of system internals, performance counters and performance measurement and analysis tools.

Ability to interpret and implement CIS security hardening recommendations in a controlled manner

Acute awareness of Security and Auditing requirements in a regulated environment

\”Infrastructure as Code\” Principles and practices.

\”Continuous Integration (CI) and Continuous Development (CD)\” Principles and practices

Git, Ansible, Terraform and TeamCity

Serena Deployment Automation (SDA) and Jenkins

Highly Desirable:

Experience on writing, managing plays/playbooks on AWX / Ansible Tower

Advance working knowledge of Kubernetes and Docker container orchestration

Microsoft SQL Server, Oracle, Sybase ASE, MongoDB and Snowflake

IBM Tivoli / Netcool

Nutanix HCI and VMWare ESX

Networking Protocols (TCP/IP, DNS, DHCP, VLANs)

RHEL, Oracle Linux, Oracle Solaris and related technologies

Cloud computing – IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle

Knowledge of data security governance and regulations such as GDPR and SOX

Desirable:

Dell EMC PowerStore (SAN) and Isilon (NAS)

Rubrik, EMC Networker, Data Domain and IBM Tivoli Storage Manager

CyberArk

Splunk

Qualys

Cisco Tetration

ServiceNow

JIRA and Confluence

Personal Requirements:

Excellent communication and interpersonal skills

Ability to handle pressure during outages and systematically resolve issues

Excellent problem-solving skills

Results driven, with a strong sense of accountability

A proactive, motivated approach

The ability to operate with urgency and prioritise work accordingly

A structured and logical approach to work

Attention to detail and accuracy

Ability to perform well in a pressurised environment

Ability to manage constructive conflict effectively

The ability to manage large workloads and tight deadlines

Able to communicate complex technical concepts to non-technical persons at all levels

We are open to considering flexible working requests in line with organisational requirements.

MUFG is committed to embracing diversity and building an inclusive culture where all employees are valued, respected and their opinions count. We support the principles of equality, diversity and inclusion in recruitment and employment, and oppose all forms of discrimination on the grounds of age, sex, gender, sexual orientation, disability, pregnancy and maternity, race, gender reassignment, religion or belief and marriage or civil partnership.

We make our recruitment decisions in a non-discriminatory manner in accordance with our commitment to identifying the right skills for the right role and our obligations under the law.

#J-18808-Ljbffr

Contact Detail:

MUFG Recruiting Team

View MUFG Profile

Vice President, Platform Engineering - Windows SRE in London

MUFG

Location: London

Vice President, Platform Engineering - Windows SRE in London

London

Full-Time
MUFG

10000+

View MUFG Profile

Similar positions in other companies

UK’s top job board for Gen Z

Discover now

Vice President, Platform Engineering - Windows SRE in London

Vice President, Platform Engineering - Windows SRE in London

Land your dream job quicker with Premium

Similar positions in other companies

UK’s top job board for Gen Z