At a Glance
- Tasks: Join us as a Site Reliability Engineer to enhance our platform's reliability and performance.
- Company: Paymentology is a global leader in payment processing, serving banks and fintechs worldwide.
- Benefits: Enjoy a full-time remote role with flexible hours and a supportive, inclusive environment.
- Why this job: Work on cutting-edge tech projects that make a real difference while growing your skills.
- Qualifications: Bachelor’s degree in IT or related field; 3+ years in SRE and 5+ years in software development required.
- Other info: Be part of a diverse team committed to advancing the world through payments.
The predicted salary is between 36000 - 60000 £ per year.
Paymentology is the first truly global issuer-processor, giving banks and fintechs the technology, team and experience to rapidly issue and process Mastercard, Visa and UnionPay cards across more than 60 countries, at scale.
Our advanced, multi-cloud platform, offering both shared and dedicated processing instances, vast global presence and richer, real-time data, set us apart as the leader in payments.
We’re on the hunt for an exceptional Site Reliability Engineer (SRE) to join our dedicated team. As an SRE at Paymentology, you’ll be the superhero responsible for maintaining, improving, and ensuring the high availability, scalability, and performance of our platform.
Tasks
Platform Reliability and Scalability:
- Build software that enhances Paymentology services’ scalability and reliability.
- Ensure platform services meet required uptime and service quality levels.
- Contribute to the design of reliable cloud infrastructure and implement reusable cloud-uptime components as code.
- Regularly review and optimise SRE practices, tools, and methodologies to enhance overall system reliability and team efficiency.
Observability and Automation:
- Contribute to the design, implementation, and maintenance of observability and monitoring solutions to track the platform health, its cost-effectiveness, the reliability, and scalability, and identify potential issues which can be fed back to product and platform engineering in a continuous improvement loop.
- Develop and implement automation scripts and tools to streamline operations and reduce manual interventions.
- Enable product teams to self-serve by participating in the development of a developer platform.
Production Issue Resolution:
- Play an active role with the incident response teams, diagnosing and resolving production issues quickly to minimise downtime.
Standards Compliance:
- Support product teams in building services that adhere to our security and quality standards.
Cross-team Collaboration:
- Work closely with engineering, operations, and product teams to ensure reliability is considered throughout the end-to-end software development lifecycle. We seek to achieve this through advocacy and developing a culture of reliability.
Requirements
- Bachelor’s Degree in Computer Science, Information Technology, or related field.
- A minimum of 3 years in a dedicated SRE role, as well as 5+ years of prior software development experience.
- Comprehensive understanding of large-scale distributed platform architecture.
- Extensive hands-on cloud experience, particularly with AWS.
- Proven experience developing scalable, modular infrastructure-as-code projects using tools such as Terraform, CloudFormation, Puppet, and Ansible.
- Practical experience with Docker and container orchestrators, including AWS ECS & EKS, and Kubernetes.
- Experience in administering or integrating identity management systems for SSO, including AWS IAM, Okta, and Active Directory.
- Experience with disaster recovery and redundancy strategies in both cloud and on-premises environments.
- Proficiency with leading monitoring tools, such as Datadog, Splunk , Prometheus, Grafana, ELK Stack, and New Relic.
- Programming expertise, especially in systems programming languages (e.g., Java, Kotlin, Scala) and databases (e.g., SQL Server, PostgreSQL).
- Familiarity with industry-leading CI/CD tools such as Jenkins, GitHub Actions, Gitlab CI, CodePipelines, CircleCI, and ArgoCD.
- Track record of achieving platform-level and end-to-end SLIs, SLOs, and SLAs, and fostering accountability.
- Ability to navigate complex situations and lead effective post-incident reviews (PIRs).
- Knowledge of implementing solutions to reduce Mean Time to Identify (MTTI) and Mean Time to Resolve (MTTR).
- Expertise in implementing best practices for load balancing, fault tolerance, and resource allocation to maintain service quality and efficiency at scale.
- Understanding of security best practices within cloud environments.
You’ll also need to bring a collaborative mindset, working seamlessly across teams to drive innovative solutions. And of course, your exceptional communication skills in English will allow you to clearly convey your ideas and recommendations.
As a key member of our technical team, you will be expected to maintain high availability and be ready to address critical incidents, ensuring the continuous performance of our systems. This includes being part of an on-call schedule to support 24/7 operations.
Benefits
- Full-time remote position with flexible hours.
- An inclusive and supportive work environment that values diversity.
- A chance to work on cutting-edge technology projects that make a difference.
- Opportunities for continuous learning and development.
Ready to Join Us? If you’re a gadget guru who thrives on optimizing infrastructure, automating all the things, and delivering sky-high availability and performance, we want to hear from you! Apply now and be part of a company that values your skills and fosters your growth.
At Paymentology we value making a difference to the lives of the people who work for us and who live in the communities where we operate. You can look forward to working with a diverse, global team where Paymentologists at all levels play an important part in our global mission to advance the world through payments and make a difference on a global scale.
#J-18808-Ljbffr
Remote Site Reliability Engineer employer: Paymentology
Contact Detail:
Paymentology Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Remote Site Reliability Engineer
✨Tip Number 1
Familiarize yourself with the specific tools and technologies mentioned in the job description, such as AWS, Terraform, and Docker. Having hands-on experience or projects that showcase your skills with these tools can set you apart from other candidates.
✨Tip Number 2
Highlight your experience with incident response and production issue resolution. Be prepared to discuss specific examples where you've successfully diagnosed and resolved issues quickly, minimizing downtime.
✨Tip Number 3
Demonstrate your understanding of observability and monitoring solutions. Share any relevant experiences where you've implemented or maintained monitoring tools like Datadog or Prometheus, as this is crucial for the role.
✨Tip Number 4
Showcase your collaborative mindset by discussing past experiences where you've worked closely with cross-functional teams. Emphasizing your ability to communicate effectively and advocate for reliability will resonate well with the hiring team.
We think you need these skills to ace Remote Site Reliability Engineer
Some tips for your application 🫡
Tailor Your CV: Make sure your CV highlights relevant experience in Site Reliability Engineering and software development. Emphasize your hands-on cloud experience, particularly with AWS, and any specific tools mentioned in the job description like Terraform or Docker.
Craft a Compelling Cover Letter: In your cover letter, express your passion for optimizing infrastructure and automating processes. Mention specific projects or experiences that demonstrate your ability to maintain high availability and performance, as well as your collaborative mindset.
Showcase Technical Skills: Clearly list your technical skills related to the job requirements, such as programming languages (Java, Kotlin, Scala), monitoring tools (Datadog, Prometheus), and CI/CD tools (Jenkins, GitHub Actions). Provide examples of how you've used these skills in past roles.
Highlight Problem-Solving Abilities: Discuss your experience with production issue resolution and your ability to lead post-incident reviews. Share specific examples of how you've diagnosed and resolved production issues quickly to minimize downtime.
How to prepare for a job interview at Paymentology
✨Showcase Your Technical Expertise
Be prepared to discuss your hands-on experience with cloud platforms, especially AWS. Highlight specific projects where you've implemented infrastructure-as-code using tools like Terraform or CloudFormation, and be ready to explain your approach to ensuring scalability and reliability.
✨Demonstrate Problem-Solving Skills
Prepare examples of how you've diagnosed and resolved production issues in the past. Discuss your role in incident response teams and how you minimized downtime, showcasing your ability to navigate complex situations effectively.
✨Emphasize Collaboration and Communication
Since cross-team collaboration is key, share experiences where you've worked closely with engineering, operations, and product teams. Highlight your communication skills and how you've advocated for reliability throughout the software development lifecycle.
✨Discuss Continuous Improvement Practices
Talk about your experience with observability and monitoring solutions. Explain how you've contributed to a culture of continuous improvement by identifying potential issues and implementing automation scripts to streamline operations.