Lead, Site Reliability Engineer (Infrastructure operations) in Dublin

Lead, Site Reliability Engineer (Infrastructure operations) in Dublin

Dublin Full-Time 70000 - 90000 £ / year (est.) No working from home possible
M

At a Glance

  • Tasks: Lead the reliability of Mastercard's critical payment systems and enhance service quality.
  • Company: Join Mastercard, a global leader in digital payments and innovation.
  • Benefits: Competitive salary, inclusive culture, and opportunities for professional growth.
  • Other info: Be part of a dynamic team ensuring 24/7 availability of essential payment systems.
  • Why this job: Make a real impact on global transactions while working with cutting-edge technology.
  • Qualifications: 5-10 years in SRE or related roles, strong troubleshooting skills, and experience with automation tools.

The predicted salary is between 70000 - 90000 £ per year.

Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.

About the Role: Mastercard’s Program aligned Site Reliability Engineering (SRE) teams are dedicated to delivering a seamless experience for our customers. We achieve this by maintaining every aspect of our Programs infrastructure and technology ecosystem to the highest standards, ensuring compliance with rigorous security requirements. Within Mastercard, SRE focuses on the reliability and performance of core infrastructure, networks, and foundational services that power our applications. Our mission is to ensure these components operate with excellence, enabling applications to deliver an outstanding customer experience. In this role, you will join our Payments Network SRE team and take ownership of continuously assessing and elevating the end to end service quality of our platform. You will leverage data to drive root cause analysis and deliver strategic insights to key stakeholders on resource utilization, capacity forecasting, and performance trends—ensuring the availability, scalability, and resilience of our network.

Key Responsibilities:

  • Lead continuous assessments of the application infrastructure supporting critical Mastercard applications, focusing on health, performance, monitoring and alerting, and capacity analysis.
  • Collaborate with Product and Development teams to forecast growth requirements and ensure scalability and resiliency.
  • Champion observability as a core principle for infrastructure services by assessing environments and technologies to uncover gaps in monitoring and alerting.
  • Design and implement strategies to close these gaps, ensuring all infrastructure telemetry is integrated into a unified, single-pane-of-glass view.
  • Build custom dashboards to investigate and perform root cause analysis on complex issues.
  • Lead regular incident reviews with internal support teams to ensure root causes are identified.
  • When patterns of failure or compatibility issues between software and infrastructure emerge, develop and implement strategies to remediate or mitigate risks.
  • Leverage automation and AI technologies to enhance proactive issue detection, enable self-healing capabilities, reducing Mean Time to Detect (MTTD) and Mean Time to Mitigate (MTTM).
  • Develop testing and validation plans for new environment builds, disaster recovery exercises and post-maintenance activities to certify environment readiness before customer traffic is routed to it.
  • Champion continuous learning, development, and knowledge sharing across networking and other infrastructure disciplines to strengthen multi-disciplinary SRE team capabilities.
  • Lead training initiatives for team members and Product and Development on networking aspects of the platforms.
  • Evaluate vendor hardware, firmware, and software upgrade roadmaps, and conduct proof-of-concept (POC) testing to identify potential risks and opportunities for improvement in upcoming releases.

All about you:

  • 5–10 years of experience in an SRE or SRE related operations role, including 3+ years supporting e-commerce, financial services, or large scale SaaS platforms.
  • Excellent infrastructure troubleshooting and analytical problem solving skills.
  • Strong hands on experience with observability and monitoring tools such as Splunk, Dynatrace, or equivalent, with a proven ability to triage and investigate complex issues.
  • Familiarity with network telemetry tools such as SolarWinds and NetScout.
  • Proficiency in packet level debugging, including capturing traffic with tools like tcpdump and analyzing packets using Wireshark.
  • Broad understanding of end to end infrastructure supporting payment platforms—spanning platform services, networking, databases, and storage.
  • Experience with automation and Infrastructure as Code tools such as Chef, Ansible, and Terraform, as well as structured data formats (JSON/YAML).
  • Excellent communication skills with the ability to coordinate cross functional troubleshooting efforts and lead RCA processes to closure.
  • Demonstrated ability to troubleshoot complex production issues, perform root cause analysis, and drive long term corrective actions.
  • Experience partnering with development teams to shape architecture, define SLIs/SLOs, and embed reliability into services from design through operation.
  • Strong understanding of monitoring and observability ecosystems, including Prometheus, Grafana, ELK/EFK, Splunk, Dynatrace, and OpenTelemetry.
  • Effective incident management skills with a structured, analytical approach to problem solving.

The Payments Network SRE team is responsible for the runtime availability of some of Mastercard’s most critical core payment systems, which support national infrastructure and operate 24/7 year-round. As a result, this role will include periodic on-call responsibilities when required.

Corporate Security Responsibility

All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:

  • Abide by Mastercard’s security policies and practices;
  • Ensure the confidentiality and integrity of the information being accessed;
  • Report any suspected information security violation or breach, and
  • Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.

Lead, Site Reliability Engineer (Infrastructure operations) in Dublin employer: Mastercard

Mastercard is an exceptional employer that fosters a culture of innovation and collaboration, empowering employees to drive meaningful change in the digital economy. With a commitment to professional growth, employees benefit from continuous learning opportunities and a supportive environment that values diversity and inclusion. Located in a dynamic industry, Mastercard offers competitive benefits and the chance to work on critical infrastructure that impacts millions globally, making it a rewarding place to advance your career.

M

Contact Details:

Mastercard Recruitment Team

StudySmarter Expert Advice🤫

We think this is how you could land Lead, Site Reliability Engineer (Infrastructure operations) in Dublin

Tip Number 1

Network with people in the industry! Reach out to current or former Mastercard employees on LinkedIn. A friendly chat can give us insights into the company culture and maybe even a referral.

Tip Number 2

Prepare for the interview by brushing up on your technical skills. Make sure we can confidently discuss our experience with observability tools and infrastructure troubleshooting, as these are key for the SRE role.

Tip Number 3

Showcase our problem-solving skills during interviews. Be ready to share specific examples of how we've tackled complex issues in past roles, especially in high-pressure situations.

Tip Number 4

Don’t forget to apply through our website! It’s the best way to ensure our application gets noticed and shows that we’re genuinely interested in joining the Mastercard team.

We think you need these skills to ace Lead, Site Reliability Engineer (Infrastructure operations) in Dublin

Site Reliability Engineering (SRE)
Infrastructure Troubleshooting
Analytical Problem Solving
Observability and Monitoring Tools
Network Telemetry Tools
Packet Level Debugging
Infrastructure as Code (IaC)

Some tips for your application 🫡

Tailor Your Application:Make sure to customise your CV and cover letter for the Lead, Site Reliability Engineer role. Highlight your experience with infrastructure operations and any relevant tools you've used. We want to see how your skills align with our mission at Mastercard!

Showcase Your Problem-Solving Skills:In your application, share specific examples of how you've tackled complex issues in previous roles. We love seeing candidates who can demonstrate their analytical problem-solving abilities, especially in high-pressure environments like e-commerce or financial services.

Highlight Collaboration Experience:Since this role involves working closely with Product and Development teams, make sure to mention any collaborative projects you've been part of. We value teamwork and want to know how you’ve contributed to cross-functional efforts in the past.

Apply Through Our Website:We encourage you to submit your application directly through our website. It’s the best way to ensure your application gets into the right hands. Plus, it shows us you're serious about joining our team at Mastercard!

How to prepare for a job interview at Mastercard

Know Your Infrastructure Inside Out

Before the interview, dive deep into the infrastructure and technology ecosystem that Mastercard uses. Familiarise yourself with their core services, monitoring tools like Splunk and Dynatrace, and how they ensure reliability and performance. This knowledge will help you speak confidently about how you can contribute to maintaining and enhancing their systems.

Showcase Your Problem-Solving Skills

Prepare to discuss specific examples of complex issues you've tackled in previous roles. Highlight your analytical problem-solving skills and how you've used root cause analysis to drive long-term solutions. This is crucial for a role focused on incident management and ensuring system resilience.

Emphasise Collaboration and Communication

Mastercard values teamwork, so be ready to talk about your experience collaborating with product and development teams. Share instances where your communication skills helped coordinate troubleshooting efforts or led to successful project outcomes. This will demonstrate your ability to work effectively in a cross-functional environment.

Be Ready to Discuss Automation and AI

Given the emphasis on leveraging automation and AI technologies, come prepared to discuss your experience with tools like Chef, Ansible, and Terraform. Talk about how you've implemented automation to enhance issue detection and reduce response times. This will show that you're aligned with Mastercard's commitment to innovation and efficiency.