US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs

   

Senior Lead Site Reliability Engineer

Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.

As a Senior Lead Reliability Engineer at JPMorgan Chase within the Consumer and Community Banking - Infrastructure & Production Management team, you work with your fellow stakeholders to define Observability non-functional requirements (NFRs) and availability targets for the services in your application and product lines.

You will ensure those NFRs are accounted for in your products' design and test phases, that your service level indicators are effectively measuring customer experience, and that service level objectives are defined with stakeholders and implemented in production.

Job responsibilities


* Lead the design, implementation, and maintenance of observability solutions across the organization.


* Develop and enforce best practices for observability, including logging, monitoring, tracing, and alerting.


* Write and maintain code in Java or similar language, Python, Angular or similar frameworks to build and enhance observability tools and platforms.

Automate repetitive tasks to improve system reliability and developer productivity.


* Implement and driving adoption of SRE principles to improve system reliability, availability, and performance.


* Design and implement monitoring and alerting strategies to proactively identify and resolve issues.


* Ensure that observability tools provide actionable insights and are aligned with business objectives.


* Work closely with cross-functional teams to integrate observability practices into the software development lifecycle.


* Mentor and guide junior engineers, fostering a culture of learning and continuous improvement.


* Lead projects related to observability initiatives, ensuring timely delivery and alignment with strategic goals.

Communicate effectively with stakeholders to provide updates and gather requirements.


* Function effectively in an agile environment, managing or contributing to backlog, velocity, and reporting on project landings

Required qualifications, capabilities, and skills


* Formal training or certification on Site Reliability and Software Engineering concepts and 5+ years applied experience


* Advanced knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as Dynatrace, Splunk, Grafana, Prometheus, Data dog, etc.


* Extensive experience in a similar SRE or observability role


* Proven track record of implementing and managing observability solutions in complex environments


* Excellent communication skills, both verbal and written, with the ability to convey complex technical concepts to non-technical stakeholders


* Collaborative mindset, with the ability to work effectively with diverse teams and stakeholders


* Stron...




Share Job