Responsibilities
- Our BI team runs a set of GCP-based APIs and data services that a lot of internal products depend on
- As we've grown, keeping things running has increasingly been a side responsibility for engineers who are primarily building features — and that's not sustainable
- We're looking for an SRE to own that space: service health, incident response, infrastructure monitoring, and making sure we're not blindly burning cloud budget
- The Site Reliability Engineer will ensure the availability, performance, and security of the Business Intelligence team’s GCP-hosted APIs and data infrastructure
- This role is responsible for proactive monitoring, incident response, and continuous improvement of platform reliability across a cloud-native stack
- The engineer will work closely with backend and data engineers to maintain service health and drive operational excellence
- This position also carries responsibility for GCP cost visibility, helping the team track and optimize cloud spend through structured monitoring and alerting
- Monitor and maintain uptime of GCP-hosted APIs and services, keeping performance within agreed targets
- Lead incident response for BI platform services — triage, resolve, and follow up with post‑mortems that actually prevent recurrence
- Build and manage observability infrastructure: dashboards, alerts, and logging across GCP services
- Track GCP cloud spend and set up cost alerting to flag anomalies before they become problems
- Review and fix security gaps — IAP configs, service account permissions, API access controls
- Work with data and backend engineers to shore up reliability of data pipelines and BigQuery workflows
- Contribute to infrastructure‑as‑code and help keep deployments documented and reproducible
Benefits
- Dollars and Sense: 401(k) match
- Happy + Healthy: Comprehensive medical plans, affordable medical, dental and vision options, 100%-paid life & disability insurance
- Break a Sweat: Free virtual fitness classes, Better Yourself Wellness program
- Always Learning: Generous annual tuition reimbursement, ongoing team trainings
- Take a Load Off: Paid vacation, sick time, and company holidays (including a floating holiday)
- Good Ol’ Fun: Team‑building events, happy hours, holiday celebrations, and more!
Qualifications
- Practical experience with GCP — Cloud Run, API Gateway, and BigQuery in particular
- Proficiency with Git and version control in a team setting
- 2+ years in a Site Reliability, DevOps, or Cloud Infrastructure role in a production environment
- Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent hands‑on experience
- Experience with monitoring and observability tooling (Cloud Monitoring, Datadog, or similar)
- Solid grasp of cloud security fundamentals — IAM, network controls, access management
- Terraform or other infrastructure‑as‑code tools
- CI/CD pipelines and deployment automation (GitHub Actions, Cloud Build, or similar)
- Python for scripting or automation
- MySQL, Spanner, or BigQuery at any meaningful depth
- Experience with dbt or Looker
- GCP cost management and spend optimization
- Comfortable working across CET/EST hours in a distributed team