US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs

   

357 - Site Reliability Engineering - Site Reliability Engineer I

What we need…

What’s in it for you…

COCC offers a collaborative environment, career growth, and all the benefits you’d expect from an award-winning employer, including:


* Hybrid schedules and ample paid time off allowing you work/life balance and flexibility


* Customized training and onboarding to support you in your first year at COCC


* Robust employee development programs aligned with career pathing objectives


* Cutting-edge training and educational resources from vendors like SANS, PluralSight and CBTNuggets 


* Generous PTO offerings, benefits and competitive compensation


* On-site fitness centers, wellness incentives, and lifestyle spending accounts


* Tuition Reimbursement


* One-on-one career coaching


* DEIB initiatives championing inclusion and encouraging you to bring your whole self to work


* Financial planning assistance with certified professionals


* Peer recognition programs

What you’ll do…

Manage and support Kubernetes clusters (on-premises and/or cloud) across production, staging, and development environments.
Ensure stability, scalability, and high availability of Kubernetes platforms.
Implement Kubernetes-native security controls (RBAC, NetworkPolicies, PodSecurityStandards)
Diagnose and resolve complex issues related to Kubernetes, container runtimes, and workloads.
Manage cluster and infrastructure configurations using tools such as Terraform, Helm, and Ansible.
Build, maintain, and troubleshoot CI/CD pipelines for Kubernetes deployments (preferably with GitLab, GitHub Actions, or similar).
Implement and maintain Kubernetes monitoring and alerting systems (e.g., Prometheus, Grafana, Loki, ELK, OpenTelemetry).

What you’ll bring…

Bachelors degree in Computer Science prefeed but will consider relevant work experience and/or certifications

Experience Creating and troubleshooting CI/CD Pipelines, preferably Gitlab 
Experience with Kubernetes, including cluster management and application deployment
Experience converting existing applications to Containers
Experience using Gitlab or Github for version control
Proficiency in bash and/or Python for scripting and automation
Basic troubleshooting knowledge of Linux and Windows
Experience managing BOSH or TKGI, preferred
Knowledge of VMware vCenter, preferred
Experience with Prometheus/Grafana/Mimir, preferred
Experience with JFrog Artifactory, preferred
Experience with Ansible, preferred





Share Job