Preferred Qualifications
- 12+ years of experience in IT operations, SRE, or DevOps roles.
- Proven track record of SRE experience in implementing observability and automation solutions in large-scale environments.
- Certifications in cloud platforms, observability tools & other SRE related areas.
- Strong expertise in implementing Site Reliability Engineering (SRE) principles.
- Advanced knowledge of establishing observability using tools β Dynatrace & Datadog (primary skills).
- Proficiency in automation & scripting using Python & Ansible (primary skills).
- Strong experience with cloud platforms β AWS & Azure (primary skills).
- Solid understanding of containerization and orchestration tools like Docker and Kubernetes.
- Proficiency in cloud native distributed systems & microservices architecture.
- Exposure to AI/ML techniques for predictive analytics and automated problem resolution.
- Familiarity with CI/CD pipelines & enabling automated release & deployment engineering solutions.
- Good to have experience with chaos engineering tools like Gremlin or Chaos Monkey and implementing automation frameworks for resilience tracking.
- Ability to manage and prioritize multiple projects in a fast-paced environment.
- Strong interpersonal and communication skills to work effectively across teams.
- Excellent problem solving, analytical thinking, and adaptability.
- Strategic mindset balancing engineering excellence with business priorities.