US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs


Senior Manager of Site Reliability Engineering - Securitized Products, Production Management - NA

Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership.

As a Senior Manager of Site Reliability Engineering at JPMorgan Chase within the Corporate Investment Bank, Markets team, you are the non-functional requirement owner and champion for the applications in your remit.

You are a key influencer in your team's strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area.

You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.

Job responsibilities



* Manage day-to-day execution of SRE functions (workload prioritization, shift coverage, triage quality, escalations, runbooks, and handoffs) to ensure consistent and timely outcomes during market hour


* Drives reuse-first adoption of enterprise-authorized AI capabilities within the work environment to improve reliability operations and customer experience outcomes, with human-in-the-loop validation and appropriate handling of sensitive data.


* Provide North America leadership for production management teams supporting trading desks across multiple Markets lines of business; ensure reliable day-to-day operations and sustained stability improvements


* Lead and coordinate L1/L2 investigations and incident response; ensure clear ownership, high-quality communications, and follow-through to root cause and prevention


* Act as a key technology partner to the trading desks: monitor operational signals, drive rapid engagement, translate business impact into technical action, and communicate clearly under pressure


* Drive adoption of SRE practices across delivery teams, ensuring best practices are implemented and demonstrated empirically via stability and reliability metrics (e.g., SLOs, error budgets, incident trends)


* Own and evolve observability (dashboards/alerts/SLOs, instrumentation, monitoring strategy) and use data to prioritize resiliency, performance, and scalability improvements


* Deliver automation and tooling that reduces operational toil and improves support effectiveness (faster diagnosis, safer remediation, repeatable fixes, and self-service workflows)


* Establish and enforce operational standards for delivery teams (operational readiness, testing discipline, release safety, rollback strategy, post-incident actions) and hold teams accountable for closing gaps


* Establishes team standards for AI-assisted reliability workflows across automation and delivery practices, ensuring traceability/auditability, resiliency, and security controls.

Required qualifications, capabilities, and skills



* Formal training or certification on site reliability engineering concepts and 5+ years applied experience .

In addition, 2 + years of experience leading technologists to manage and solve complex technical items within your domain of expertise
...




Share Job