VP of Incident Management & Operational Resilience in London
VP of Incident Management & Operational Resilience

VP of Incident Management & Operational Resilience in London

London Full-Time No home office possible
Go Premium
D

VP, Incident Management & Operational Resilience
Join to apply for the VP, Incident Management & Operational Resilience role at dLocal

Get AI-powered advice on this job and more exclusive features.

Why should you join dLocal?
dLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets. Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly. As both a payments processor and a merchant of record where we operate, we make it possible for our merchants to make inroads into the world\’s fastest-growing, emerging markets.

By joining us you will be a part of an amazing global team that makes it all happen, in a flexible, remote‐first dynamic culture with travel, health and learning benefits, among others. Being a part of dLocal means working with 1000+ teammates from 30+ different nationalities and developing an international career that impacts millions of people\’s daily lives. We are builders, we never run from a challenge, we are customer‐centric, and if this sounds like you, we know you will thrive in our team.

What\’s the opportunity?
We are looking for a VP of Incident Management & Operational Resilience to design, build, and lead the company‐wide incident management function at dLocal. This role owns the global framework, governance, and execution model for all critical incidents across the company – not only Technology. You will be accountable for how dLocal anticipates, responds to, and learns from major events impacting customers, operations, security, compliance, employees, and our reputation. You will work closely with business, operations, technology, and risk leaders to ensure incidents are handled in a consistent, predictable, and transparent way.

What will I be doing?

Define and own the company‐wide incident management strategy

Set the vision, strategy, and operating model for global incident management and operational resilience across dLocal, aligned with our NASDAQ‐listed public company obligations, board‐approved risk appetite, investor expectations, and growth plans

Define and maintain a unified incident management and resilience framework including policies, processes, playbooks, severity/impact matrices, roles and responsibilities, and escalation paths for business, operational, security, and technology incidents

Ensure alignment and integration with Business Continuity, Disaster Recovery, Operational Risk, Security, and Compliance frameworks, and with global regulations and standards such as DORA, PSD2, NIS2, data‐breach and outage‐reporting regimes, and central‐bank expectations across 45+ countries and regions

Build and lead the incident management function

Build and lead a distributed global incident management and resilience organization, with regional leaders and incident commanders in LATAM, EMEA, APAC, and North America, operating in a follow‐the‐sun, 24/7 model

Define clear ownership boundaries between the central function and domain teams (IT, Security, Operations, CS, Product, Finance, Legal, Compliance, Corporate Communications, etc.) and ensure everyone understands their role during incidents

Develop a network of trained Incident Commanders and functional responders across regions and time zones, with clear expectations and training paths, including participation in a structured, global on‐call rotation that guarantees 24/7 executive and technical coverage for high‐severity incidents

Govern major incident execution & 24/7 global operations

Ensure that all high‐severity incidents follow a structured lifecycle: detection, assessment, triage, containment, mitigation, recovery, and closure, with hand‐offs and ownership clearly defined between regions (LATAM, EMEA, APAC, North America) in a follow‐the‐sun model

Establish criteria for declaring major incidents/crises, and ensure rapid mobilization of the right stakeholders, including executives when needed

Define expectations and standards for real‐time decision‐making, risk trade‐offs, approvals, and business sign‐offs during incidents

Personally act as executive sponsor and, when necessary, executive Incident Commander for the most critical, company‐level events

Own incident communications and stakeholder alignment

Define and enforce standards for internal and external communications during incidents (frequency, content, channels, approvals)

Ensure effective coordination with Customer Success, Commercial, Operations, Legal, Compliance, Security, and Communications/PR on messaging to merchants, partners, employees, and regulators

Provide clear, concise updates to the Executive Team and Board‐level forums during and after major incidents

Institutionalize learning and continuous improvement

Establish and own the post‐incident review process for significant incidents, ensuring high‐quality root cause analysis, clear corrective and preventive actions, and accountable owners

Create governance to track, prioritize, and close post‐incident action items, and to escalate systemic issues that require investment or strategic decisions

Use incident data to identify structural weaknesses (technology, process, organization, capacity, controls) and feed them into roadmaps, risk registers, and investment cases

Enable teams with processes, tooling, and training

Define the strategy and requirements for incident management tooling (alerting, collaboration channels, runbook systems, ticketing/workflow integration, status pages, dashboards)

Partner with Tech/SRE, Security, and Operations to ensure observability, monitoring, and alerting support effective incident detection and response

Design and run training programs, simulations, and game days across the company so all teams know how to respond and elevate appropriately

Measure, report, and challenge performance

Define and own incident KPIs and KRIs (e.g. MTTD, MTTR, incident volume by type/severity, recurrence rate, SLA/SLO breaches, comms timeliness, business impact)

Produce regular executive‐level reporting and insights on incident trends, operational risk, and readiness, with clear proposals for improvement

Challenge teams constructively on incident quality (runbooks, communications, action plans) and champion a no‐blame, learning‐oriented culture

What skills do I need?
Experience

10+ years of experience in roles related to incident management, operations, SRE/DevOps, security operations, or crisis management, with increasing scope and complexity

Proven experience leading major, cross‐functional incidents in high‐availability, high‐risk environments (e.g. payments, fintech, financial services, large‐scale SaaS, telco, cloud)

5+ years in people leadership roles, building and leading teams and/or programs that cut across multiple departments or regions

Experience designing and implementing company‐wide processes or frameworks (incident management, BCP/DR, operational risk, or similar)

Leadership & Communication

Strong executive presence and the ability to lead under pressure, making decisions with incomplete information and bringing clarity in ambiguous situations

Excellent communication skills in English, both written and spoken, with the ability to tailor messages to ICs, managers, executives, customers, and external stakeholders

Demonstrated ability to influence without formal authority, align conflicting priorities, and work effectively with senior stakeholders (C‐level, functional heads)

Process, Risk & Analytical Skills

Structured problem‐solving skills and a data‐driven mindset, comfortable working with metrics, trends, and root‐cause analysis

Strong understanding of operational risk, business continuity, and control environments, ideally in a regulated or audited context

Ability to balance short‐term mitigation vs. long‐term resilience, making and articulating trade‐offs clearly

Domain & Tooling

Familiarity with frameworks and practices such as ITIL/ITSM, SRE, incident command, operational resilience, BCP/DR

Hands‐on experience with some of the following (or equivalent) is a plus: monitoring/observability platforms, ticketing/ITSM systems, on‐call and alerting tools, collaboration platforms, status page tools, and runbook automation

Comfort working with distributed, multi‐time‐zone teams and 24/7 operations

Nice to have

Experience interacting with regulators, auditors, or Board‐level committees on topics related to incidents, outages, operational risk, or resilience

Exposure to standards such as SOC 1/2, ISO 27001, PCI DSS, DORA, or operational resilience regulations, especially where incident management and business continuity are in scope

Background in payments, banking, or financial infrastructure

What success looks like in this role

dLocal has a clear, trusted, and widely adopted incident management framework that is understood across all departments and regions

Major incidents are handled in a predictable, well‐coordinated way, with faster time to detect and resolve, and fewer repeats

Incident communication is transparent, timely, and consistent, building trust with customers, partners, regulators, and internal teams

Post‐incident reviews consistently lead to real improvements in technology, processes, controls, and organization

Incident data and learnings are actively used to shape strategy, investments, and risk decisions at the executive level

The incident management function is seen as a key enabler of reliability and growth, not just a reactive firefighting team

What do we offer?

Remote work: work from anywhere or one of our offices around the globe!*

Flexibility: we have flexible schedules and we are driven by performance

Fintech industry: work in a dynamic and ever‐evolving environment, with plenty to build and boost your creativity

Referral bonus program: our internal talents are the best recruiters – refer someone ideal for a role and get rewarded

Learning & development: get access to a Premium Coursera subscription

Language classes: we provide free English, Spanish, or Portuguese classes

Social budget: you\’ll get a monthly budget to chill out with your team (in person or remotely) and deepen your connections!

dLocal Houses: want to rent a house to spend one week anywhere in the world coworking with your team? We\’ve got your back!

For people based in Montevideo (Uruguay) applying to non‐IT roles, 55% monthly attendance to the office is required

What happens after you apply?
Our Talent Acquisition team is invested in creating the best candidate experience possible, so don\’t worry, you will definitely hear from us. We will review your CV and keep you posted by email at every step of the process!

Also, you can check out our webpage, Linkedin, Instagram, and Youtube for more about dLocal!

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

#J-18808-Ljbffr

D

Contact Detail:

dLocal Recruiting Team

VP of Incident Management & Operational Resilience in London
dLocal
Location: London
Go Premium

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

D
  • VP of Incident Management & Operational Resilience in London

    London
    Full-Time
  • D

    dLocal

    201-500
Similar positions in other companies
UK’s top job board for Gen Z
discover-jobs-cta
Discover now
>