US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs

   

Lead Software Engineer- Resiliency

Be an integral part of an agile team that's constantly pushing the envelope to enhance, build, and deliver top-notch technology products.

As a Lead Site Reliability Engineer at JPMorgan Chase within the Employee Compute Branch Team you will play a pivotal role in designing, implementing, and overseeing automation for observability and notification across a diverse set of systems in a global Microsoft Windows environment.

You will lead by example, bringing hands-on expertise in PowerShell and C#, and infusing best practices into a team of highly experienced system engineers.

Your work will directly impact the reliability, scalability, and efficiency of our platforms, with a strong focus on cloud (Azure and AWS) integration.

Job Responsibilities


* Champion site reliability engineering culture and practices, exerting technical influence across the team.


* Lead the design and hands-on implementation of automated observability and notification solutions using PowerShell and C#.


* Drive initiatives to improve reliability and stability of applications and platforms through data-driven analytics and automation.


* Collaborate with team members to define and implement service level indicators, objectives, and error budgets.


* Architect and implement monitoring, alerting, and telemetry solutions using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk.


* Act as the primary technical lead during major incidents, quickly identifying and resolving issues to minimize impact.


* Mentor and upskill system engineers, fostering a programming mindset and best practices in automation and reliability.


* Facilitate cross-team and cross-region collaboration, ensuring alignment and knowledge sharing.


* Document and share technical solutions and best practices within internal forums and communities of practice.


* Engage with stakeholders to understand business needs and translate them into technical solutions, with increasing responsibility over time.


* Break down complex problems into actionable work for the team, ensuring clear direction and accountability.

Required qualifications, capabilities, and skills



* Formal training or certification on Site Reliability Engineering concepts and 5+ years applied experience


* Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, and toil reduction, with proven ability to implement these practices.


* Expert-level fluency in PowerShell and C# in a Microsoft Windows environment.


* Hands-on experience with cloud platforms, specifically Azure and AWS.


* Demonstrated experience in automated software testing (unit, integration, end-to-end).


* Deep knowledge of software applications and technical processes, with emerging depth in one or more technical disciplines.


* Proficiency and experience in observability, including white and black box monitoring, SLO alerting, and telemetry colle...




Share Job