US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs

   

Site Reliability Engineer

Site Reliability Engineer

  

This role has been designed as ‘’Onsite’ with an expectation that you will primarily work from an HPE partner/customer office.

Who We Are:

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work.

We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know diverse backgrounds are valued and succeed here.

We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.

If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.

Job Description:

HPE is seeking a System Administrator to design, test and administer systems in support of the Supercomputing as a Service (SCaaS) business.

 This is an exciting opportunity to have a significant impact on a key business with considerable growth potential.

 In this role, you will have a great deal of creative freedom to define and develop solutions that will support a scaling customer base.

 

This role will be performed onsite at the datacenter in Dallas, TX.

Primary Responsibilities



* Ensure continuous uptime of HPC systems at large scale


* Provide system administration for our groundbreaking Supercomputing-as-a-Service system


* Creation of scripting and infrastructure as code to automate the support of cloud infrastructures and HPC-as-a-Service clusters


* Brings technical thinking to break down complex data and to engineer new ideas and methods for solving, prototyping, designing, and implementing cloud-based solutions


* Help design and implement security aspects of the computing infrastructure


* Administration of cloud based HPC systems


* Collaborates with project managers and development partners to ensure effective and efficient delivery, deployment, operation, monitoring, and support of HPC engagements

Experience and Skills



* Experience in Linux systems administration, planning, and maintenance


* An understanding of high-speed networks


* An understanding of the security concerns in a cloud environment


* Hands-on experience with Linux administration at scale


* Good communication skills, including English


* Hands on experience with the tools and infrastructure to support HPC systems at scale including networking and storage


* Datacenter

Hands on experience 

  - Cabling
  - GPU/CPU/Memory/Storage
  - Hardware testing/swaps/replacements

               - Monitoring hardware alerts/alarms



* Proficient in the use and operation of Linux based environments including shells, system configuration and administrative skills.


* Prio...




Share Job