Sr Site Reliability Engineer
Business Unit:
STChealth is a company focused on vaccine intelligence and immunization data management — it connects public and private healthcare sources to deliver real-time immunization information.
Their platform is used by thousands of locations, and they emphasize data integrity, real-time analytics, and enabling better decision-making in public health.
Headquarters: Phoenix, Arizona (US).
Job Summary:
The Site Reliability Engineer (SRE) supports a U.S.
public health SaaS platform processing protected health information (PHI) under HIPAA.
The role emphasizes automation, monitoring, and reliability engineering for regulated environments.
The SRE will partner closely with U.S.-based teams to enhance observability, CI/CD automation, and operational maturity in non-production and staging systems—maintaining compliance with HIPAA, SOC2, and corporate data protection standards.
Core Responsibilities
- Automate infrastructure provisioning, configuration, and maintenance using Terraform, Ansible, and Python.
- Build, enhance, and maintain CI/CD pipelines using Jenkins, GitHub Actions, or AWS CodePipeline for continuous delivery and consistency across environments.
- Implement and optimize monitoring solutions using Datadog, Prometheus, Grafana, and ELK/EFK stacks to ensure high service reliability.
- Develop alerting strategies and escalation paths aligned to service-level objectives (SLOs) and key performance indicators (KPIs).
- Build custom scripts and automation for patching, validation, and system health checks.
- Partner with U.S.
SREs and Engineering teams on environment management, change control, and incident response improvements.
- Analyze logs and performance metrics to identify stability issues, optimize cloud costs, and drive continuous improvement.
- Maintain detailed runbooks, SOPs, and documentation supporting operational readiness and knowledge transfer.
- Contribute to open-source or internal tooling that enhances automation, monitoring, or observability capabilities.
- Conduct periodic reliability reviews, performance tests, and failover simulations to validate readiness.
- Support adoption of infrastructure-as-code, immutable environments, and container orchestration (Docker/Kubernetes).
- Promote DevOps and SRE best practices across the engineering organization.
Tools & Technologies
AWS (EC2, S3, Lambda, CloudWatch, IAM, RDS, ECS/EKS), Terraform, Ansible, Python, Bash, Jenkins, GitHub Actions, Docker, Kubernetes, Prometheus, Grafana, ELK/EFK, Loki, Jira, Confluence.
Qualifications
- 5–7 years in SRE, DevOps, or Infrastructure Engineering.
- Bachelor’s degree in computer science or related field of study preferred, or equivalent experience
- Experience supporting U.S.
healthcare or other regulated SaaS systems (HIPAA, SOC2, ISO27001).
- Strong scripting and automation (Ansible, Jenkins, Python, Bash, Terraform, CloudFormation).
- Understanding of CI/CD, networking, and secure cloud architecture.
- Prove...
- Rate: 92268
- Location: Mumbai, IN-MH
- Type: Permanent
- Industry: Engineering
- Recruiter: Bizmatics India Private Limited
- Contact: Not Specified
- Email: to view click here
- Reference: R0032625
- Posted: 2026-02-21 08:43:22 -
- View all Jobs from Bizmatics India Private Limited
More Jobs from Bizmatics India Private Limited
- Director of Strategy and Business
- Shipping Supervisor
- Stamping Set Up Operator
- Quality Technician - Document Control (Chittenango, NY)
- METAL PROCESSING OPERATOR - Entry Level (Chittenango, NY)
- Sales Manager - Abbot Kinney
- TRDI Labor/Grounds Maint - Falfurrias, Tx
- Advisor - Biltmore Fashion Park
- Field Service Technician
- Advisor - Suburban Square
- Optometrist, FT (PC) - FASHION VALLEY
- Physical Therapy Assistant
- Occupational Therapist- Sign-On Bonus Available!!!
- Physical Therapist Sign-On Bonus Available!
- Physical Therapy Assistant
- Outpatient Occupational Therapist
- Physical Therapist
- General Laborer
- Production Operator 3rd Shift
- Associate Maintenance Mechanic