At a Glance
- Tasks: Design and maintain scalable infrastructure while automating processes for a global payments platform.
- Company: Join Paydock, a leading payments orchestration platform with a remote-first culture.
- Benefits: Enjoy competitive salary, flexible remote work, and opportunities for professional growth.
- Why this job: Make a global impact by solving complex problems in a dynamic tech environment.
- Qualifications: 5+ years cloud experience, IaC mastery, and strong coding skills required.
- Other info: Collaborative team culture focused on innovation and continuous learning.
The predicted salary is between 36000 - 60000 £ per year.
Paydock is a leading payments orchestration platform, empowering businesses to manage and scale their payment strategies seamlessly. We provide a single, elegant API to connect to a vast ecosystem of payment gateways and methods, simplifying complexity and unlocking new revenue opportunities for our merchants worldwide. As a geodistributed team, we thrive on asynchronous communication and a culture of ownership, trust, and innovation.
We are seeking an experienced and proactive Senior Site Reliability Engineer (SRE) to join our global infrastructure team. You will be a guardian of our production environment, responsible for its health, performance, and scalability. Your mission is to apply software engineering principles to solve operational problems, automate everything, and ensure our platform exceeds the reliability expectations of our customers.
You will work with a talented, distributed team of engineers across different time zones, making your mark on a platform that processes millions of transactions. This role requires a deep passion for eliminating toil, a proactive approach to system stability, and excellent communication skills to thrive in a remote-first environment.
What You’ll Do
- Architect & Automate: Design, build, and maintain our core infrastructure using Infrastructure as Code (IaC) principles. You’ll be instrumental in evolving our CI/CD pipelines to ensure safe, rapid, and reliable releases.
- Enhance Reliability & Scalability: Proactively identify and address performance bottlenecks, single points of failure, and scalability limits. You’ll define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to maintain and improve platform health.
- Champion Observability: Implement and manage comprehensive monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK Stack) to provide deep insights into system behaviour and ensure rapid incident detection.
- Lead Incident Management: Participate in our on-call rotation, acting as a key player in incident response and resolution. You’ll lead blameless post-mortems to identify root causes and implement preventative measures.
- Collaborate & Empower: Work closely with software engineering teams to foster a culture of reliability. You’ll provide guidance on building resilient services, implementing best practices for observability, and improving the developer experience.
- Secure the Foundation: Implement and maintain security best practices across our cloud infrastructure, ensuring our platform is robust and compliant.
What You’ll Bring
- Must-Haves:
- Extensive Cloud Experience: 5+ years of hands-on experience with a major cloud provider, preferably AWS (EC2, S3, RDS, VPC, IAM, etc.).
- Infrastructure as Code (IaC) Mastery: Deep proficiency with tools like Terraform or CloudFormation to manage infrastructure declaratively.
- Containerization Expertise: Strong experience with Docker and container orchestration systems like Kubernetes (EKS) or ECS.
- CI/CD Pipeline Development: Proven ability to build, optimize, and manage CI/CD pipelines using tools like GitLab CI, Jenkins, or CircleCI.
- Observability Skills: Hands-on experience with modern monitoring and logging tools (e.g., Prometheus, Grafana, Loki, Alertmanager, ELK Stack).
- Strong Scripting/Coding Ability: Proficiency in at least one programming language, such as Go, Python, or Bash, for automation and tooling.
- Remote Work Pro: Excellent written and verbal communication skills, with a proven ability to work effectively and asynchronously in a distributed team environment.
- Experience in the payments or FinTech industry.
- Familiarity with service mesh technologies like Istio or Linkerd.
- Experience with database administration (e.g., PostgreSQL, MySQL).
- Knowledge of networking, security principles, and compliance standards (e.g., PCI DSS).
Why Join Paydock?
- Work from Anywhere: Enjoy the flexibility and autonomy of a fully remote, geodistributed team.
- Make a Global Impact: Build and scale the infrastructure for a platform trusted by businesses worldwide.
- Culture of Growth: We encourage continuous learning and provide opportunities for professional development in a supportive and collaborative environment.
- Meaningful Work: Solve complex, interesting problems that have a direct and tangible impact on our product and customers.
- Competitive Compensation: We offer a competitive salary, benefits package, and the right tools to help you succeed.
Senior Site Reliability Engineer (SRE) in Warrington employer: Paydock
Contact Detail:
Paydock Recruiting Team
StudySmarter Expert Advice 🤫
We think this is how you could land Senior Site Reliability Engineer (SRE) in Warrington
✨Tip Number 1
Network like a pro! Reach out to current employees on LinkedIn or other platforms. Ask them about their experiences at Paydock and express your interest in the Senior SRE role. Personal connections can give you an edge!
✨Tip Number 2
Prepare for technical interviews by brushing up on your IaC skills and containerisation knowledge. Practice coding challenges and system design questions that relate to cloud infrastructure. We want to see your problem-solving skills in action!
✨Tip Number 3
Showcase your passion for reliability and automation during interviews. Share specific examples of how you've improved system performance or reduced downtime in previous roles. We love hearing about real-world impacts!
✨Tip Number 4
Don’t forget to apply through our website! It’s the best way to ensure your application gets seen by the right people. Plus, it shows you’re genuinely interested in joining our team at Paydock.
We think you need these skills to ace Senior Site Reliability Engineer (SRE) in Warrington
Some tips for your application 🫡
Tailor Your Application: Make sure to customise your CV and cover letter to highlight your experience with cloud technologies and Infrastructure as Code. We want to see how your skills align with our needs, so don’t hold back!
Showcase Your Projects: If you've worked on any relevant projects, especially those involving CI/CD pipelines or observability tools, be sure to mention them. We love seeing real-world applications of your skills!
Be Clear and Concise: When writing your application, keep it straightforward and to the point. We appreciate clarity, so avoid jargon unless it's necessary. Remember, we’re looking for effective communication skills!
Apply Through Our Website: We encourage you to submit your application through our website. It’s the best way for us to receive your details and ensures you’re considered for the role. Plus, it’s super easy!
How to prepare for a job interview at Paydock
✨Know Your Infrastructure Inside Out
Before the interview, make sure you’re well-versed in the core infrastructure technologies mentioned in the job description. Brush up on your knowledge of AWS services, Infrastructure as Code tools like Terraform, and container orchestration with Kubernetes. Being able to discuss these topics confidently will show that you’re serious about the role.
✨Demonstrate Your Problem-Solving Skills
Prepare to share specific examples of how you've tackled operational problems in the past. Think about times when you automated processes or improved system reliability. Use the STAR method (Situation, Task, Action, Result) to structure your answers, making it easy for the interviewer to follow your thought process.
✨Show Your Passion for Observability
Since the role emphasises observability, be ready to discuss your experience with monitoring and logging tools like Prometheus and Grafana. Share insights on how you’ve implemented these systems to enhance performance and incident response. This will highlight your proactive approach to system stability.
✨Communicate Effectively in a Remote Setting
Given the remote-first nature of the team, practice articulating your thoughts clearly and concisely. Prepare to discuss how you’ve successfully collaborated with distributed teams in the past. Highlight your communication strategies and tools you’ve used to stay connected, as this will demonstrate your fit for their culture.