US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs

   

Site Reliability Engineer

DESCRIPTION:

Duties: Responsible for the availability, performance, change management, monitoring, and capacity management of US Private Bank services/products.

Create individualized Service Level Objectives and Service Level Indicators, dashboards, and observability solutions customized to the needs of the Product.

Drive adoption of self-healing and resiliency patterns.

Contribute to product or software in order to automate manual operational work.

Troubleshoot priority incidents, facilitates blameless post-mortems and supports solutions for closure.

Apply analytics on past data, like incidents and usage patterns for predicting issues and takes proactive actions.

Define and drive adoption of a best in class monitoring framework to accomplish end to end application or service monitoring and noiseless alerting.

Deploy sustainable software, system and product upgrades.

QUALIFICATIONS:

Minimum education and experience required: Master's degree in Management Science, Computer Science, any Engineering discipline, Mathematics and Sciences, Data Sciences, or related field of study plus 3 years of experience in the job offered or as Site Reliability Engineer III, Data Engineer, or related occupation.

The employer will alternatively accept a Bachelor's degree in Management Science, Computer Science, any Engineering discipline, Mathematics and Sciences, Data Sciences, or related field of study plus 5 years of experience in the job offered or as Site Reliability Engineer III, Data Engineer, or related occupation.

Skills Required: This position requires 1 year of experience with the following: using Ansible to execute event driven automation activities.

This position requires 2 years of experience with the following: managing and troubleshooting applications and services deployed across various environments, comprising: Physical Environments, Virtual Environments (VMWare or RedHat OpenShift), On-Premises Containerized Deployments in Private Cloud using Cloud Foundry, Public Cloud Environments (AWS or Azure); programming with Python; with Cloud, API, Event-Driven, and Micro-services technologies for application environments deployed on infrastructure comprising at least 100+ computer cores (CPUs) and SAN storage; experience developing and managing infrastructure as code (IaC) using Terraform.

This position requires any amount of experience with the following: designing and maintaining CD pipelines using Spinnaker and Harness; Automating release workflows, managing canary and blue-green deployments, and ensuring zero-downtime rollouts across the environment using Jenkins Pipeline and Harness; Configuring and optimizing Jenkins for CI/CD workflows; conducting chaos engineering experiments using Gremlin; monitoring and optimizing application performance using Dynatrace; Setting up and configuring Grafana for data visualization and monitoring; Splunk for log management and analysis; Utilizing SourceGraph for comprehensive code search and repository navigat...




Share Job