US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs

   

ALTS - Lead SRE

We are seeking an experienced Lead Site Reliability Engineer (SRE) to manage and guide our team.

The ideal candidate will have a strong foundation in SRE, DevOps, or infrastructure engineering, with leadership skills and the ability to drive team success in a fast-paced, dynamic environment.

This role involves overseeing the team's execution, risk management, and strategic initiatives while fostering a collaborative and innovative culture.

Key Responsibilities:

Team Leadership and Management:


* Lead, mentor, and develop a team of SREs, fostering a culture of collaboration and continuous improvement


* Set clear goals and expectations for the team, ensuring alignment with business objectives.


* Facilitate regular team meetings and one-on-one sessions to support individual growth and team cohesion

Execution and Delivery:


* Oversee the delivery of major themes of work, ensuring high-quality execution and timely completion


* Guide the team in estimating delivery timelines and managing workloads effectively


* Provide expert guidance in debugging and systems design, encouraging innovative solutions and trade-off analysis

Risk Management:


* Assess cross-impact of team deliverables and ensure proactive communication of potential risks


* Support the team in identifying technical limitations and suggesting remediation strategies

Strategic Vision and Forward Thinking:


* Develop and implement strategic plans for building robust systems with strong contracts, anticipating future changes


* Encourage the team to propose alternative requirements and solutions that better meet organizational needs


* Set and prioritize the strategic book of work for the team in line to support goals of the business

Communication and Stakeholder Engagement:


* Communicate effectively with stakeholders, providing updates on progress and raising risks that will impact delivery


* Ensure the team is aligned with the business vision and understands the importance of their contributions to the product

Qualifications:


* Experience directly leading or functioning as a lead of technical teams, with a focus on SRE, DevOps, or infrastructure engineering


* Proficiency in programming languages (Python preferred) and distributed systems (Kubernetes, Kafka, Cassandra, etc.)


* Experience with setting up and using SLOs to track system health and performance


* Excellent problem-solving skills and creativity in debugging complex issues


* Deep understanding of cloud fundamentals and infrastructure management


* Exceptional communication skills, with the ability to articulate technical problems and solutions to diverse audiences



* A strategic mindset with a keen interest in automation and learning


* Having a thorough understanding of the full stack of the system

Am example of a Task/Problem to be tackled is below.

Does leading a team solving system wide problems excite you?

Our system has ...




Share Job