At a Glance
- Tasks: Design and maintain monitoring solutions to ensure system reliability and performance.
- Company: Join a leading fintech transforming global equity capital markets.
- Benefits: Unlimited PTO, comprehensive benefits, and hybrid work environment.
- Other info: Collaborative culture with continuous learning and career growth opportunities.
- Why this job: Make a real impact in a fast-paced tech environment with cutting-edge tools.
- Qualifications: Experience in SRE, cloud platforms, and strong programming skills required.
The predicted salary is between 60000 - 80000 £ per year.
The Company Capital Markets Gateway LLC (CMG) is a fintech focused on equity capital markets, transforming global ECM through data, technology, and connectivity. The CMG platform is used by nearly 150 buy‑side firms representing $40 trillion in AUM and 22 global investment banks.
CMG is looking for a Site Reliability Engineer (SRE) with a strong focus on monitoring, observability, and alerting to ensure the reliability, performance, and scalability of our infrastructure and applications. You will design, implement, and maintain monitoring solutions to provide visibility into system health and performance, proactively detect anomalies, and reduce incident response time.
Our Engineering Team consists of domain experts who work collaboratively within a culture of cross‑domain knowledge sharing.
Responsibilities
- Monitoring & Observability: Design, implement, and maintain monitoring and observability solutions using tools like Prometheus, Grafana Stack (Loki/Grafana/Tempo/Alert Manager), Datadog, and OpenTelemetry. Define and implement SLOs, SLIs, and error budgets to measure system reliability. Develop and optimize dashboards, alerts, and reports for system performance and business metrics.
- Alerting & Incident Management: Design actionable alerting strategies to minimize noise and improve MTTR. Integrate alerting systems with Jira. Establish and refine runbooks for on‑call teams to handle alerts efficiently. Empower teams to ensure observability coverage and incident response practices.
- Performance Optimization: Analyze system performance metrics, identify bottlenecks, and implement optimizations to improve system efficiency, scalability, and cost-effectiveness. Help conduct load testing and capacity planning to ensure systems can handle peak traffic loads.
- Automation and Tooling: Identify opportunities for automation and develop tools to streamline operational processes, such as fail‑over, configuration management, and monitoring. Implement monitoring and alerting systems within automations to detect and resolve issues proactively.
- Collaboration and Communication: Collaborate closely with cross‑functional teams, including software engineers, operations, and infrastructure teams, to understand system requirements, provide technical guidance, and drive solutions. Communicate effectively to stakeholders about system changes, incidents, and improvements. Foment and spread SRE principles and practices across the company.
Qualifications
- Must be based in Latin America
- English level – C1 or C2
- Proven experience as a Site Reliability Engineer or a similar role.
- Proficiency in logging, metrics, and tracing frameworks (DataDog, Loki, Prometheus, OpenTelemetry).
- Experience with cloud platforms (Azure preferred) and infrastructure-as-code tools (e.g., Terraform).
- Strong programming and scripting skills (Python, Bash).
- Proficiency in containerization technologies and orchestration tools (Docker, Kubernetes).
- Understanding of Linux‑based systems, networking, and security principles related to containerized applications.
- Strong problem‑solving and troubleshooting skills, with a passion for identifying and resolving complex technical issues.
- Excellent communication and collaboration abilities.
- Ability to thrive in a fast‑paced, constantly evolving environment.
- Experience with PostgreSQL monitoring and optimization (Optional/Nice to have).
Our Tech Stack
- Azure as an infrastructure provider.
- Docker + Kubernetes for microservice orchestration using Istio service mesh.
- PostgreSQL for relational db, ElasticSearch for indexing, Redis for caching.
- DataDog, Grafana and OpenTelemetry for observability.
- GitHub for version control and CI (with our own runners).
- CD: Harness and FluxCD.
- Terraform and Terragrunt as IaC.
- Python and Bash for scripting infrastructure.
- React – we are all in on React – we maintain multiple single‑page React apps.
- TypeScript – 99% of our codebase is TypeScript.
- Latest .NET version for our backend services.
- GraphQL – our standard for API communication is GraphQL served by our .NET back‑end.
Our Values
- We innovate with purpose
- We focus on outcomes vs. output
- We believe diverse and inclusive teams fuel innovation
- We are humble yet candid
- We do right by the customer
What We Offer
- Equity
- Unlimited PTO (28 days including bank holidays plus additional paid leave)
- Comprehensive benefits program managed by Globalization Partners
- Premium life and income protection
- Top private medical and dental insurance
- Employee Assistance Program (EAP)
- Pension contributions
- Hybrid work environment (initially remote until office setup is complete)
- Education reimbursement
- Continuous learning opportunities
- Employee referral bonus
- Parental leave
CMG embraces our ongoing commitment to building a culture reflecting the people, perspectives, and passions it represents. We will accept nothing less than equity, inclusion, and belonging for all. With the only constant in life being change, we will always listen, learn, and improve for the betterment of our teams, customers, and communities. CMG is proud to be an Equal Opportunity Employer.
Site Reliability Engineer employer: Capital Markets Gateway
At Capital Markets Gateway LLC (CMG), we pride ourselves on being an exceptional employer, offering a dynamic work culture that fosters innovation and collaboration among our engineering team. With a strong commitment to employee growth, we provide continuous learning opportunities, unlimited PTO, and comprehensive benefits, all within a hybrid work environment that values diversity and inclusion. Join us in transforming the equity capital markets while enjoying the unique advantages of working in a fintech leader based in Latin America.
StudySmarter Expert Advice🤫
We think this is how you could land Site Reliability Engineer
✨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect with current employees at CMG. A friendly chat can sometimes lead to opportunities that aren’t even advertised!
✨Tip Number 2
Show off your skills! Create a portfolio or GitHub repository showcasing your projects related to monitoring, observability, and automation. This gives you a chance to demonstrate your expertise beyond just a CV.
✨Tip Number 3
Prepare for the interview by brushing up on SRE principles and the tech stack mentioned in the job description. Be ready to discuss how you've tackled performance issues or implemented monitoring solutions in the past.
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining the CMG team.
We think you need these skills to ace Site Reliability Engineer
Some tips for your application 🫡
Tailor Your Application:Make sure to customise your CV and cover letter for the Site Reliability Engineer role. Highlight your experience with monitoring tools like Prometheus and Datadog, and show us how your skills align with our needs.
Show Off Your Technical Skills:We want to see your technical prowess! Include specific examples of your work with cloud platforms, containerisation, and automation. Don’t forget to mention any relevant projects that demonstrate your problem-solving abilities.
Be Clear and Concise:When writing your application, keep it clear and to the point. Use bullet points where possible to make it easy for us to read through your qualifications and experiences quickly.
Apply Through Our Website:We encourage you to apply directly through our website. It’s the best way for us to receive your application and ensures you’re considered for the role. Plus, it shows us you’re keen on joining our team!
How to prepare for a job interview at Capital Markets Gateway
✨Know Your Tools
Familiarise yourself with the specific monitoring and observability tools mentioned in the job description, like Prometheus, Grafana, and DataDog. Be ready to discuss your experience with these tools and how you've used them to enhance system reliability.
✨Showcase Your Problem-Solving Skills
Prepare examples of complex technical issues you've resolved in the past. Highlight your troubleshooting process and how you optimised performance or reduced incident response times. This will demonstrate your capability as a Site Reliability Engineer.
✨Understand the Company Culture
Research Capital Markets Gateway LLC's values and culture. Be prepared to discuss how you align with their focus on innovation, collaboration, and customer-centricity. Showing that you fit into their culture can set you apart from other candidates.
✨Ask Insightful Questions
Prepare thoughtful questions about the team dynamics, current challenges they face, and how they measure success in the SRE role. This not only shows your interest but also helps you gauge if the company is the right fit for you.