At a Glance
- Tasks: Join an agile team to enhance reliability and observability for critical platforms.
- Company: JPMorgan Chase, a leader in commercial and investment banking.
- Benefits: Diverse and inclusive culture with opportunities for growth and development.
- Other info: Collaborate with diverse teams and contribute to cutting-edge technology solutions.
- Why this job: Make a significant impact on technology products while solving complex challenges.
- Qualifications: Experience in software engineering and advanced knowledge of site reliability practices.
The predicted salary is between 80000 - 100000 € per year.
Be an integral part of an agile team that's constantly pushing the envelope to enhance, build, and deliver top-notch reliability and observability for our most critical platforms. As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the Commercial & Investment Bank, you are a key member of an agile team that works to build and deliver trusted market‑leading technology products in a secure, stable, and scalable way. Drive significant business impact through your capabilities and contributions, applying deep technical expertise and problem‑solving methodologies to tackle a diverse array of reliability, observability, and performance challenges that span multiple technologies and applications.
Job Responsibilities
- Regularly provides technical guidance and direction on site reliability practices to support the business and its technical teams, contractors, and vendors.
- Develops secure and high‑quality production code for reliability tooling and telemetry pipelines, and reviews and debugs code written by others.
- Drives decisions that influence reliability design, observability architecture, application functionality, and technical operations and processes.
- Serves as a function‑wide subject matter expert in one or more areas of site reliability, observability, or telemetry engineering.
- Leads resiliency design reviews and breaks up complex reliability problems into digestible work for other engineers, acting as a technical lead for large‑sized products.
- Acts as the main point of contact during major incidents, demonstrating the skills to identify and solve issues quickly to avoid financial losses, and champions blameless postmortem culture.
- Collaborates with team members and stakeholders to define comprehensive service level indicators, service level objectives, and error budgets.
- Designs, implements, and maintains operational reliability for large‑scale OpenTelemetry pipelines on hybrid on‑prem/cloud environments, supporting telemetry ingestion, processing, and export to backends such as InfluxDB, Prometheus, Elasticsearch, and OpenSearch.
- Drives the assessment, refactoring, and incremental migration of custom legacy telemetry collection code to standardized OpenTelemetry instrumentation, reducing technical debt while maintaining system stability.
- Actively contributes to the engineering community as an advocate of firmwide frameworks, tools, and practices, and influences peers and project decision‑makers to consider the use and application of leading‑edge observability and reliability technologies.
- Adds to the team culture of diversity, opportunity, inclusion, and respect.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and advanced applied experience delivering system design, application development, testing, and operational stability.
- Advanced knowledge of reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices, with considerable in‑depth knowledge in one or more technical disciplines (e.g., cloud, observability, distributed systems, etc.).
- Advanced proficiency in one or more programming languages (e.g., Java, Python, Go, etc.).
- Advanced proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, Elasticsearch, etc.
- Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.).
- Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.).
- Hands‑on experience with the design, deployment, and operation of OpenTelemetry collectors in production environments, focusing on technical aspects such as configuring, optimizing, and troubleshooting OTLP endpoints and receivers.
- Ability to tackle reliability design and functionality problems independently with little to no oversight.
- Practical cloud native experience.
- Ability to expand and collaborate across different levels and stakeholder groups.
Preferred qualifications, capabilities, and skills
- Knowledge of distributed tracing, metrics, and logging best practices.
- Certification in AWS, Kubernetes, or relevant technologies.
- Proven track record in system health monitoring, capacity management, and blameless postmortems for high‑availability services.
- Deep understanding of distributed system design principles, networking (TCP/IP, DNS, load balancing), and Linux internals.
- Contributions to open‑source observability or telemetry projects.
- Experience working with agent control planes and management protocols; hands‑on knowledge of OpAMP is highly desirable.
Equal Opportunity Employer Statement
We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs.
Senior Lead Site Reliability Engineer in Glasgow employer: J.P. Morgan
At JPMorgan Chase, we pride ourselves on fostering a dynamic and inclusive work environment where innovation thrives. As a Senior Lead Site Reliability Engineer, you'll benefit from our commitment to employee growth through continuous learning opportunities and access to cutting-edge technology, all while contributing to impactful projects within the Commercial & Investment Bank. Our culture emphasises collaboration, diversity, and respect, making it an exceptional place for professionals seeking meaningful and rewarding careers in a leading financial institution.
StudySmarter Expert Advice🤫
We think this is how you could land Senior Lead Site Reliability Engineer in Glasgow
✨Tip Number 1
Network like a pro! Reach out to current employees at JPMorgan Chase on LinkedIn or other platforms. Ask them about their experiences and any tips they might have for landing the Senior Lead Site Reliability Engineer role.
✨Tip Number 2
Prepare for technical interviews by brushing up on your coding skills and reliability concepts. Practice common SRE scenarios and be ready to discuss how you've tackled complex problems in the past.
✨Tip Number 3
Showcase your passion for observability and reliability! During interviews, share specific examples of projects where you implemented these practices. This will demonstrate your expertise and commitment to the role.
✨Tip Number 4
Don't forget to apply through our website! It’s the best way to ensure your application gets noticed. Plus, it shows you're serious about joining the team at JPMorgan Chase.
We think you need these skills to ace Senior Lead Site Reliability Engineer in Glasgow
Some tips for your application 🫡
Tailor Your Application:Make sure to customise your CV and cover letter to highlight your experience with reliability and observability. We want to see how your skills align with the role, so don’t hold back on showcasing your technical expertise!
Showcase Your Problem-Solving Skills:In your application, share specific examples of how you've tackled complex reliability challenges in the past. We love seeing candidates who can break down problems and come up with innovative solutions!
Be Clear and Concise:When writing your application, keep it straightforward and to the point. We appreciate clarity, so avoid jargon unless it's relevant to the role. Make it easy for us to see why you’re a great fit!
Apply Through Our Website:We encourage you to submit your application directly through our website. It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it’s super easy!
How to prepare for a job interview at J.P. Morgan
✨Know Your Tech Inside Out
Make sure you brush up on your technical skills, especially in areas like cloud technologies, observability tools, and programming languages mentioned in the job description. Be ready to discuss your experience with tools like Grafana, Prometheus, and OpenTelemetry, as well as any relevant projects you've worked on.
✨Showcase Problem-Solving Skills
Prepare to share specific examples of how you've tackled complex reliability issues in the past. Think about times when you had to act quickly during incidents and how you contributed to blameless postmortems. This will demonstrate your ability to handle high-pressure situations effectively.
✨Understand the Company Culture
Familiarise yourself with JPMorgan Chase's values around diversity, inclusion, and collaboration. Be ready to discuss how you can contribute to a positive team culture and support the company's commitment to these principles. This shows that you're not just a tech whiz but also a great team player.
✨Prepare Questions for Them
Have a few thoughtful questions ready to ask your interviewers. This could be about their current challenges in site reliability or how they measure success in their teams. It shows your genuine interest in the role and helps you assess if the company is the right fit for you.