US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs


Senior Lead Site Reliability Engineer

Replace the first sentence with \"As a Senior Lead Site Reliability Engineer at JPMorgan Chase within Consumer and Community banking team, you will set clear quality gates across requirements, design, secure coding, testing, releases, and post-production monitoring to ensure reliability, performance, security, and observability.

Job responsibilities


* Set clear quality gates across requirements, design, secure coding, testing, releases, and post-production monitoring to ensure reliability, performance, security, and observability.


* Turn business goals into clear, testable requirements-and hold teams to an objective "Definition of Done" before release.


* Define and manage SLIs/SLOs and error budgets, and ensure they're reflected in roadmaps and delivery plans.


* Lead operational readiness reviews, assess delivery risk, and drive fixes through root-cause analysis, corrective actions, and automation to prevent repeat issues.


* Improve logging, monitoring, and alerting so dashboards are actionable and alerts are tuned to reduce noise and speed response.


* Own CI/CD controls (security, reliability, testing, change management) and drive automation to reduce toil and increase release confidence.


* Lead and participate in major incident response (including outside business hours when needed), run post-incident reviews, and drive improvements against KPIs like availability, MTTR, and change failure rate.

Required qualifications, capabilities, and skills


* 10+ years supporting critical applications in large-scale environments, including experience leading and mentoring engineers/teams.


* Strong SDLC and secure development practices, with experience implementing objective quality gates and release readiness standards.


* Hands-on SRE experience, including SLIs/SLOs, error budgets, incident management, and post-incident reviews/root-cause analysis.


* Experience designing actionable monitoring/logging and dashboards (e.g., Splunk, AppDynamics, or equivalent), including alert tuning.


* Experience with CI/CD pipelines and automated testing (unit, integration, security), plus operational controls that reduce change risk.


* Calm, accountable incident leadership under pressure, with strong communication and stakeholder management.


* Comfortable collaborating with global teams and engaging during critical incidents outside standard business hours.

Preferred qualifications, capabilities, and skills


* Experience leading operational readiness reviews and maintaining "Definition of Done" checklists (SLO monitoring, runbooks, rollback validation, resilience/failover testing, vulnerability remediation, audit/control artifacts).


* Deep public cloud expertise (AWS or equivalent), including infrastructure automation (Terraform/Terraform Enterprise, CloudFormation), capacity planning, and resilience patterns for distributed systems.


* Track record of improving reliability outcomes (hig...




Share Job