US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs


Senior Lead Site Reliability Engineer

Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership.

As a Senior Lead Site Reliability Engineering at JPMorgan Chase within the Infrastructure & Production Management sector of Consumer & Community Banking, you are the non-functional requirement owner and champion for the applications in your remit.

You are a key influencer in your team's strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area.

You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.

Job responsibilities


* Demonstrates expertise in site reliability principles and demonstrates an understanding of the fine balance between features, efficiency, and stability


* Effectively negotiates with peers and executive partners to ensure optimal outcomes for all


* Drives the adoption of site reliability practices throughout the organization


* Ensures your teams demonstrate site reliability best practices with the ability to demonstrate this empirically through stability and reliability metrics


* Drives a culture of continual improvement and solicits real-time feedback to improve the customer's experience


* Ensures your team collaborates with other teams within your group's specialization and avoids duplication of work where possible


* Follows blameless, data-driven, post-mortem strategies and conducts regular team debriefs to enable learning from both successes and mistakes


* Provides personalized coaching for entry to mid-level team members


* Ensures your team documents and shares their knowledge and innovations via internal forums, communities of practice, guilds, and conferences

Required qualifications, capabilities, and skills


* Formal training or certification in software engineering concepts and 5+ years of applied experience; plus 2+ years leading technologists to manage and solve complex technical items within your domain.


* Advanced proficiency in SRE culture and principles, with a track record of implementing SRE practices across application and platform teams while avoiding common pitfalls.


* Strong observability fundamentals: define and measure SLIs, set and manage SLOs and error budgets, build actionable alerting and dashboards; hands-on experience with Dynatrace and Splunk.


* Proven resiliency engineering: capacity planning, failure mode analysis, fault-tolerant design (circuit breakers, retries, bulkheads), disaster recovery strategies, and running game days.


* Proficiency in at least one programming language (e.g., Python, Java Spring Boot, .NET) to build production-grade automation and tooling; deeper coding skills are a plus but not a hard requirement.


* Proficiency in CI/CD and Infrastructure as Code (e.g., Jenkins, GitLab, Terraform), including pipeline design, environment promot...




Share Job