Major Incident and Problem Manager, Associate
Major Incident and Problem Manager, Associate

Major Incident and Problem Manager, Associate

Full-Time 60000 - 75000 £ / year (est.) No home office possible
B

At a Glance

  • Tasks: Lead major incidents and drive structured recovery using AI-driven workflows.
  • Company: Join a leading FinTech company focused on innovation and collaboration.
  • Benefits: Flexible Time Off, education reimbursement, and comprehensive health resources.
  • Other info: Hybrid work model promoting collaboration and continuous learning.
  • Why this job: Make a real impact by improving system stability and performance with cutting-edge technology.
  • Qualifications: 5+ years in Incident Management, strong troubleshooting skills, and a DevOps mindset.

The predicted salary is between 60000 - 75000 £ per year.

Team Overview

The Service Management team provides industry-standard Incident, Problem and Change Management, alongside infrastructure operational support for Aladdin. We operate using modern engineering practices and tooling, including ServiceNow and AI-enabled workflows, and measure outcomes through clear operational metrics.

Role

We are seeking an experienced Incident & Problem Manager (5+ years) with a strong passion for technical troubleshooting and the ability to lead multiple simultaneous incidents. This role exists to deliver rapid time to detect and time to resolve, and to eliminate repeat incidents at a system level by operating an AI-first incident delivery model. The Major Incident & Problem Manager is accountable for turning incidents into measurable stability improvements—particularly those caused by change—and for building an incident operating rhythm where AI handles correlation, classification and narrative generation by default, allowing humans to focus on decision quality, trade-offs and prevention.

In complex distributed platforms, incidents are often slowed by manual triage, fragmented ownership and time-consuming coordination. This role addresses those challenges by creating a decision-centric incident response model, powered by AI-driven signal correlation and automation-first execution, ensuring that:

  • The right responders are engaged faster
  • The most likely causes are identified sooner
  • Mitigation decisions are taken with clearer risk framing
  • Communications remain accurate and timely
  • Repeat failures are systematically removed rather than documented

The role partners closely with Engineering and SRE / DevOps teams, leveraging automation, observability tooling and emerging AI-driven insights. The successful candidate will have a DevOps mindset, be able to actively troubleshoot, and utilise and enhance AI and automation. The role also includes participation in continuous improvement initiatives aimed at improving the stability, performance and resilience of the Aladdin platform, and enhancing Service Management services.

Key Responsibilities

  • Lead major incidents as a decision authority (P1–P4)
  • Lead end-to-end management of production incidents, including investigation, recovery execution and closure
  • Run incidents as a decision system, driving clarity on what is known, what is suspected and what action is taken next
  • Manage multiple simultaneous incidents while maintaining consistent prioritisation and escalation
  • Operate an AI-first incident workflow (human-validated, human-overridden when required)
  • Triage and categorise incidents using AI-driven classification, with human validation and override where appropriate
  • Drive AI-automated ticket routing and apply risk-based escalation judgement when automation is insufficient
  • Ensure incident timelines and summaries are produced to a high standard using AI-generated artefacts, correcting them where required
  • Supervise automated remediation and agentic responders, intervening to pause, override or redirect when risk requires
  • Ensure automated remediation is safe, auditable and aligned with service ownership and operational readiness
  • Manage a robust Problem Management process to prevent incident recurrence
  • Ensure root causes and preventative actions are clearly captured and translated into an effective Problem Management process
  • Identify incident trends and repeat patterns, driving scalable remediation to reduce recurrence
  • Partner with Engineering and SRE / DevOps to embed learnings into automation, observability, runbooks and readiness controls
  • Design, build and actively maintain a Known Error Database that functions as a real-time operational asset
  • Work with product teams to design, build and deliver a meaningful process for addressing repeat incidents
  • Deliver executive-grade communications (AI-drafted, human-approved)
  • Validate, approve and issue regular communications that are concise, informative and appropriate for stakeholders
  • Ensure communications accurately reflect impact, mitigation progress, key risks and confidence-based ETAs
  • Drive continuous service improvement and regulatory alignment
  • Drive process and tooling changes that support operational resilience and regulatory requirements, including DORA and GDPR, where applicable
  • Provide input and ownership for continual service improvement initiatives, with a primary focus on Agentic AI and its application to Incident Management

Required Experience and Capabilities (Must Have)

  • 5+ years’ experience in Incident and Problem Management within a production environment supporting business-critical platforms
  • Strong technical troubleshooting capability, with the ability to engage credibly with engineers during complex failures
  • Proven ability to lead multiple simultaneous incidents and drive structured recovery under pressure
  • DevOps mindset, with comfort using observability tooling, automation and operational engineering practices
  • Ability to produce clear, high-quality communications suitable for senior stakeholders
  • Experience operating AI systems for triage, correlation and narrative generation, with sound judgement on when outputs require validation or override
  • Ability to translate repetitive incident activity into automation requirements and drive adoption with engineering partners

Advantages / Desirable Qualities

  • Experience working in or with FinTech or regulated environments
  • Knowledge of cloud platforms such as Azure and/or AWS, and understanding of IaaS / PaaS / SaaS service models
  • Experience with Microsoft Copilot and AI-enabled productivity tooling
  • Programming capability (e.g. Python) to automate common tasks or prototype improvements
  • Familiarity with configuration management, deployment and orchestration tooling (e.g. Ansible)
  • Strong data analysis skills using tools such as Splunk, Grafana, Tableau, Excel and/or Power BI
  • Strong experience with ServiceNow and operational reporting

Benefits

To help you stay energized, engaged and inspired, we offer a wide range of employee benefits including: retirement investment and tools designed to help you in building a sound financial future; access to education reimbursement; comprehensive resources to support your physical health and emotional well-being; family support programs; and Flexible Time Off (FTO) so you can relax, recharge and be there for the people you care about.

Hybrid Work Model

BlackRock’s hybrid work model is designed to enable a culture of collaboration and apprenticeship that enriches the experience of our employees, while supporting flexibility for all. Employees are currently required to work at least 4 days in the office per week, with the flexibility to work from home 1 day a week. Some business groups may require more time in the office due to their roles and responsibilities. We remain focused on increasing the impactful moments that arise when we work together in person – aligned with our commitment to performance and innovation. As a new joiner, you can count on this hybrid model to accelerate your learning and onboarding experience here at BlackRock.

Equal Opportunity Employer

BlackRock is proud to be an Equal Opportunity Employer. We evaluate qualified applicants without regard to age, disability, race, religion, sex, sexual orientation and other protected characteristics at law.

Major Incident and Problem Manager, Associate employer: BLACK ROCK FINANCIAL LTD

At BlackRock, we pride ourselves on being an exceptional employer, offering a dynamic work culture that fosters collaboration and innovation. Our hybrid work model not only supports flexibility but also enhances learning and growth opportunities for our employees, ensuring they thrive in their roles. With comprehensive benefits that prioritise your well-being and a commitment to continuous improvement, joining our team as a Major Incident and Problem Manager means becoming part of a forward-thinking organisation dedicated to making a meaningful impact.
B

Contact Detail:

BLACK ROCK FINANCIAL LTD Recruiting Team

StudySmarter Expert Advice 🤫

We think this is how you could land Major Incident and Problem Manager, Associate

✨Tip Number 1

Network like a pro! Reach out to folks in your industry on LinkedIn or at events. A friendly chat can lead to opportunities that aren’t even advertised yet.

✨Tip Number 2

Prepare for interviews by practising common questions and scenarios related to incident management. We recommend role-playing with a friend to boost your confidence and refine your answers.

✨Tip Number 3

Showcase your problem-solving skills during interviews. Share specific examples of how you’ve tackled incidents in the past, especially those involving AI and automation—this is what will set you apart!

✨Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure your application gets noticed. Plus, we love seeing candidates who are proactive about their job search.

We think you need these skills to ace Major Incident and Problem Manager, Associate

Incident Management
Problem Management
Technical Troubleshooting
AI Systems Operation
Observability Tooling
Automation
Communication Skills
Data Analysis
ServiceNow
Cloud Platforms (Azure, AWS)
Programming (Python)
Configuration Management
Operational Engineering Practices
Continuous Improvement

Some tips for your application 🫡

Tailor Your Application: Make sure to customise your CV and cover letter to highlight your experience in Incident and Problem Management. Use keywords from the job description to show that you understand what we're looking for.

Show Off Your Technical Skills: Don’t hold back on showcasing your technical troubleshooting abilities! Mention specific tools and technologies you've used, especially those related to AI and automation, as they’re super relevant to this role.

Be Clear and Concise: When writing your application, keep it clear and to the point. We love well-structured communications, so make sure your key achievements and experiences stand out without unnecessary fluff.

Apply Through Our Website: We encourage you to apply directly through our website. It’s the best way to ensure your application gets into the right hands and shows us you're serious about joining our team!

How to prepare for a job interview at BLACK ROCK FINANCIAL LTD

✨Know Your Tech Inside Out

Make sure you brush up on your technical troubleshooting skills. Since this role requires a strong grasp of incident management and AI systems, be prepared to discuss specific tools you've used, like ServiceNow or any observability tooling. Show them you can engage credibly with engineers during complex failures.

✨Demonstrate Your Decision-Making Skills

This position is all about leading multiple incidents and making quick decisions under pressure. Prepare examples from your past experiences where you successfully managed incidents, highlighting how you prioritised tasks and drove structured recovery. They want to see your ability to maintain clarity in chaos!

✨Communicate Like a Pro

Since you'll be delivering executive-grade communications, practice articulating complex information clearly and concisely. Think about how you would explain technical issues to non-technical stakeholders. Bring examples of your previous communications to the interview to showcase your skills.

✨Show Your Continuous Improvement Mindset

Be ready to discuss how you've contributed to continuous service improvement initiatives in the past. This could include automating repetitive tasks or embedding learnings into operational processes. They’ll appreciate your proactive approach to enhancing stability and performance.

Major Incident and Problem Manager, Associate
BLACK ROCK FINANCIAL LTD

Land your dream job quicker with Premium

You’re marked as a top applicant with our partner companies
Individual CV and cover letter feedback including tailoring to specific job roles
Be among the first applications for new jobs with our AI application
1:1 support and career advice from our career coaches
Go Premium

Money-back if you don't land a job in 6-months

>