US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs


Software Engineering Manager 1 - Streaming & Cloud Platform Reliability

Software Engineering Manager 1 - Streaming & Cloud Platform Reliability

This role has been designed as ''Onsite' with an expectation that you will primarily work from an HPE office.

Who We Are:

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work.

We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today's complex world.

Our culture thrives on finding new and better ways to accelerate what's next.

We know varied backgrounds are valued and succeed here.

We have the flexibility to manage our work and personal needs.

We make bold moves, together, and are a force for good.

If you are looking to stretch and grow your career our culture will embrace you.

Open up opportunities with HPE.

Job Description:

We're looking for a hands‑on Software Engineering Manager to lead a small team (2-4 developers) focused on improving the reliability of Mist's cloud platform by driving concrete postmortem action items from our incident management process.

This team owns follow‑ups from production incidents-especially those involving our streaming data pipelines (Kafka / Flink / Storm) and core APIs.

You'll work closely with senior engineers to turn incident learnings into durable engineering improvements.

This is a hybrid role requiring on‑site collaboration multiple days per week in Cupertino, California.

Due to the requirements of this position, this role requires a US Citizen or Green Card holder.

What You'll Do


* Own and drive post‑incident follow‑ups from our Incident Management process, turning incident reports into design and implementation work.


* Lead, mentor, and grow a 2-4 person engineering team, while contributing hands‑on code in production services.


* Design, implement, and harden streaming topologies using Kafka, Storm, and/or Flink (e.g., stats, telemetry, alerts, pcaps).


* Improve reliability of core APIs (REST API, WebSocket, Webhooks, etc.), including auth, rate limiting, and DR‑sensitive flows.


* Enhance observability and runbooks: add metrics/alerts, define SLOs, and codify playbooks for recurring incident patterns.


* Collaborate with SRE, Platform, and Data teams on DR, multi‑region, and multi‑cloud behavior (AWS, GCP, DR regions).


* Ensure robust testing and deployment practices (unit/integration tests, regression tests for past incidents, safe rollout/rollback).

Experience Required for this Role


* 7+ years total professional software engineering experience.


* This is a hybrid role requiring on‑site collaboration multiple days per week in Cupertino, California.

Due to the requirements of this position, this role requires a US Citizen or Green Card holder.


* 2+ years in a team lead role (mentors, performance feedback, prioritization), while remaining hands‑on technically.
...




Share Job