Senior Site Reliability Engineer
Essential Functions:
* Partner with software developers, platform engineers, and IT staff to improve system design, operability, deployment safety, and production support readiness.
* Define and maintain operational standards, runbooks, support procedures, escalation paths, and service-level objectives.
* Evaluate system architecture and changes to ensure they balance functional requirements, service quality, reliability, security, and compliance needs.
* Drive continuous improvement in platform stability, maintenance, and availability.
* Provide advanced technical support and troubleshooting for complex platform and service issues affecting internal users and stakeholders.
Experience and Skills Required:
* 8+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Systems Engineering, or related infrastructure roles supporting production services.
* Strong experience with Linux systems administration and troubleshooting in enterprise environments.
* Strong experience operating and maintaining on-prem Kubernetes platforms and all related components including CRI, CNI, and CSI plugins.
* Experience deploying and maintaining applications on Kubernetes using Helm, Kustomize, and similar tooling.
* Experience supporting DevOps tooling such as GitLab, Artifactory, Jira, Confluence.
* Experience with GitOps tools such as FluxCD or ArgoCD.
* Proficiency scripting with at least one of Python, Go, or Bash.
* Strong experience designing, maintaining, and maturing observability tooling including monitoring, dashboards, logging and tracing, and supporting SLOs.
* Strong understanding of reliability engineering concepts:
+ Service health indicators
+ High availability design, failure reduction, and testing
+ Operational readiness practices, including developing documentation, runbooks, and architectural descriptions
+ Incident response, root cause analysis, remediation/recovery
* Ability to obtain a security clearance, which includes U.S.
citizenship.
Preferred:
* Experience with multiple Linux distributions including Ubuntu.
* Experience with at least one of the following: Tanzu Kubernetes, Nutanix Kubernetes Platform, Canonical Kubernetes.
* Experience with cloud platforms such as AWS and Azure.
* Experience with infrastructure automation and configuration management.
* Experience managing AI tooling on Kubernetes including MCP Servers, LLM platforms (vLLM, Ollama), Kubeflow.
* Experience with security and compliance considerations in regulated environments.
* DoD experience.
* Active or inactive Secret Security Clearance.
Education:
* Bachelor’s degree in CS, Software Engineering or other IT-related field or equivalent experience
REMOTE WORK NOTICE: This position may be performed fully remote, hybrid, or onsite at an ARA office.
Preference will be given to c...
- Rate: Not Specified
- Location: Albuquerque, US-NM
- Type: Permanent
- Industry: Management
- Recruiter: Applied Research Associates, Inc
- Contact: Nina Uka
- Email: to view click here
- Reference: SENIO009685-00001
- Posted: 2026-04-04 07:53:32 -
- View all Jobs from Applied Research Associates, Inc
More Jobs from Applied Research Associates, Inc
- Senior Network and Systems Administrator
- TRDI Janitor - Kingsville
- Metal Finisher (Albany, OR)
- Optometrist, PT - Belle Hall
- Optometrist, FT (PC) - Avenue West Cobb
- Optometrist, FT (PC) - Perimeter Mall
- Optometrist, PT (PC) - Kingston Pike
- Licensed Optical Supervisor - Las Olas
- Advisor - Town Center Crossing
- Optometrist, FT - Somerset
- Sales Manager- Old Town Los Gatos
- Advisor - Montgomery Village
- Advisor - Market and Main
- Optical Supervisor - Bloomington 79th & Penn
- Advisor - Derby Street Shops
- Store Leader - Delray Place
- Advisor - Kenwood
- Optometric Technician - Summit
- Sales Supervisor - Kenwood
- Guest Service Associate - Temp Full Time Position until 6/30/2026