US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs

   

SRE Tech Lead

SRE Tech Lead

This role has been designed as 'Hybrid' with an expectation that you will work on average 2-3 days per week from an HPE office.

Who We Are:

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work.

We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today's complex world.

Our culture thrives on finding new and better ways to accelerate what's next.

We know diverse backgrounds are valued and succeed here.

We have the flexibility to manage our work and personal needs.

We make bold moves, together, and are a force for good.

If you are looking to stretch and grow your career our culture will embrace you.

Open up opportunities with HPE.

Job Description:

About the Role

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our team and drive our technical agenda.

You will play a key role in ensuring the reliability, scalability, and performance of our systems and those of our customers.

As a Senior SRE, you will be responsible for influencing the design, implementation, and maintenance of robust infrastructure, automating operational tasks, and enhancing system observability.

You will work closely with development, operations, and security teams to create resilient, high-performing systems that support business growth.

Key Responsibilities

System Reliability & Performance

Be an advocate for highly available, scalable, and resilient systems in cloud or hybrid environments.

Work with development and support teams to improve the implementation to achieve a better customer experience and lower operating costs.

Define and manage Service Level Objectives (SLOs), Service Level Agreements (SLAs), and Service Level Indicators (SLIs) to ensure system reliability.

Proactively identify performance bottlenecks and implement optimizations to improve system efficiency.

Automation & Infrastructure as Code (IaC)

Automate and drive for the automation of repetitive tasks and operational workflows to reduce toil and improve system efficiency.

Incident Response

Audit, verify and improve incident response procedures, including runbooks, and post-incident reviews.

Security & Compliance

Collaborate with security teams to ensure compliance with best practices in cloud security, access control, and vulnerability management.

Collaboration & Leadership

Mentor junior SREs and software engineers, fostering a culture of reliability and operational excellence.

Work closely with development teams to build resilient applications with best-in-class reliability and performance.

Advocate for SRE best practices across the organization, promoting a culture of shared responsibility for system reliability.

Qualifications & Skills

Required:

12+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, Operation...




Share Job