Site Reliability Engineer III
As a Senior Site Reliability Engineer, you'll bridge the gap between software development and operations, applying software engineering principles to infrastructure and operations problems.
You'll help design, build, and maintain the systems that keep our services reliable and scalable while working closely with development teams to improve application performance and resilience.
* Design, implement, and maintain reliable infrastructure systems with a focus on security, scalability, reliability, and automation using tools like Terraform or CloudFormation
* Build and maintain scalable and resilient production systems with a focus on automation
* Develop and implement monitoring solutions to ensure system health, performance, and availability
* Lead incident response, perform root cause analysis, and implement preventative measures
* Track SLOs, and SLAs to measure and improve service reliability and error budgets to drive reliability improvements
* Design and implement CI/CD pipelines to enable rapid and reliable software delivery
* Partner with development teams to improve application performance, resilience, and scalability
* Contribute to capacity planning and performance optimization initiatives
* Participate in an on-call rotation to support production systems
* Mentor junior engineers and contribute to the growth of the team
* Develop and evolve security monitoring, alerting, and incident response frameworks
* 10+ years of experience in SRE, DevOps, or similar roles
* Expertise in incident management, disaster recovery, and building resilience engineering frameworks
* Strong programming skills in at least one language such as Java, or Python
* Extensive experience with Linux/Unix systems administration
* Hands-on experience with serverless (Lambda) and containerization technologies (Docker)
* Experience implementing and managing cloud infrastructure (AWS, Azure DevOps)
* Advanced understanding of networking concepts, load balancing, security best practices, and CDN technologies
* Strong experience with observability systems (like Dynatrace)
* Knowledge of database technologies and their performance characteristics
* Demonstrated experience leading incident response and post-mortem analysis
* Bachelor's degree in computer science or equivalent practical experience
* Deep knowledge of infrastructure-as-code tools (Terraform, CloudFormation)
* Mastery of CI/CD pipeline design and implementation (Jenkins, GitLab CI, Azure DevOps)
* Experience building and maintaining comprehensive monitoring and alerting systems
* Experience managing high-traffic, mission-critical production environments
* Strong background in capacity planning and performance optimization
* Proven ability to mentor junior SREs and elevate team capabilities
* Experience driving cross-team initiatives to improve reliability practice...
- Rate: Not Specified
- Location: Jersey City, US-NJ
- Type: Permanent
- Industry: Finance
- Recruiter: ISO CLAIMS SERVICES INC
- Contact: Not Specified
- Email: to view click here
- Reference: 2684
- Posted: 2025-11-12 07:27:36 -
- View all Jobs from ISO CLAIMS SERVICES INC
More Jobs from ISO CLAIMS SERVICES INC
- Right of Way (ROW) Agent (Field Based)
- Enviromental Health Safety Associate Manager (Field Based)
- Consulting Partner
- Managing Consultant, Environmental Air Quality
- Instrumentation Technician - $5,000 Sign-on Bonus
- Crane Operator
- Business Systems Lead
- Outside Sales Representative/Manager
- Project Manager - IT Infrastructure
- Manager - Trade and Customs US & CA
- Project Manager - IT Infrastructure
- Project Manager - IT Infrastructure
- U.S. Trade and Customs Lead
- Licensed Telephonic Counselor - Evernorth - Remote, Colorado
- Lead Application Developer - Financial Pricing
- Staff Pharmacist - Scottsdale, AZ
- Marketing Diagnostics and Life Sciences Summer 2026 Intern
- AVP, Provider Contracting- Cleveland and North Ohio Markets - Hybrid - Cigna Healthcare
- Senior Business Analyst - Audit and Governance - Hybrid
- Warehouse Associate Representative - Accredo