Lead - ML Ops Engineer
What will you do?
* Design, develop, and document Infrastructure as Code (Terraform) for ML/LLM platform components on AWS/Databricks; implement secure, scalable foundations for data, compute, networking, and secrets.
* Build and maintain GitHub based pipelines (Actions/Workflows) for training, packaging, validation, and deployment of ML/LLM assets (models, evaluation suites, prompts, policies), using GitOps for environment promotion.
* Containerize models using Docker and deploy them primarily through managed endpoints (SageMaker/Azure ML); Kubernetes-based serving (KServe/Triton/Seldon) is a plus.
* Operate model registries and feature stores; enforce versioning, lineage, and artifact governance via MLflow/Databricks and cloud native services.
* Implement logs/metrics/traces, performance profiling, and drift/quality monitors; define SLIs/SLOs and on call runbooks; drive incident response and post-mortems with accountability (business hours support rotation).
* Embed DevSecOps: secrets management, IAM/RBAC, vulnerability scanning, image signing, policy as code, least privilege access, backup/DR/resiliency patterns; align with enterprise security standards.
* Operationalize GenAI: prompt/content safety filters, evaluation harnesses (human in the loop), grounding/attribution logging, token cost & latency tracking, and red teaming pipelines integrated into CI/CD.
* Monitor and optimize compute/storage/bandwidth and inference costs; implement right sizing, autoscaling, and caching strategies.
* Partner with Data Scientists to productize models; co design platform features with stakeholders; deliver documentation, templates, and knowledge transfers that accelerate safe reuse.
* Run operations (RUN): Troubleshoot escalations, improve monitoring, automate administration/IRP tasks, and continuously harden reliability, performance, and security across environments.
What skills and capabilities will make you successful?
*
+ Technical Experience:
+ Understanding of DevOps concepts such as reference implementation enforcement, use of shared DevOps stacks, infrastructure optimization (performance, cost, HA, resiliency), release management (GitOps best practices), and QA automation frameworks.
+ Strong knowledge of AWS ecosystems and Databricks integration.
+ Proficiency in Terraform for developing, testing, and maintaining Infrastructure-as-Code to manage cloud services for ML engineering.
+ Hands-on experience with CI/CD using GitHub, GitHub Actions, and Workflow automation to support continuous integration, delivery, and deployment of ML assets.
+ Strong experience with Docker; Kubernetes is a plus.
+ MLflow (tracking/registry), model registries, feature stores, experiment tracking, and lineage management; Databricks and cloud native equivalents.
+ Build pipelines for training, testing (unit/integration/e2e), evaluation...
- Rate: Not Specified
- Location: Bangalore, IN-KA
- Type: Permanent
- Industry: Finance
- Recruiter: Schneider Electric
- Contact: Not Specified
- Email: to view click here
- Reference: 95790-en-us
- Posted: 2026-02-25 07:37:47 -
- View all Jobs from Schneider Electric
More Jobs from Schneider Electric
- Gardener - InterContinental Hayman Great Barrier Reef
- Waste Management - Trades Assistant - InterContinental Hayman Great Barrier Reef
- Sr Director Finance Special Projects
- Off Shift Press Break Operator
- Welder - 1st Shift
- Assembler - 1st Shift
- Unit Assembler - 2nd Shift (Shift Differential!)
- Operations Supply Chain Co-op
- Equipment Mechanic - Field
- Supply Chain Analyst
- Machinist - 2nd Shift (Shift Differential!)
- Equipment Mechanic - Field
- Certified Welding Inspector (Full-Time, Permanent)
- CWI/NACE Inspector
- Site Operations Manager
- Professional Services Coordinator
- Financial Planning and Analysis Manager
- BAKERY/CAKE DECORATOR
- BAKERY/CAKE DECORATOR
- FROZEN FOOD/CLERK