US Jobs US Jobs     UK Jobs UK Jobs     EU Jobs EU Jobs


AI/ML Lead Data Engineer - Automation/Image Processing

Join us as we embark on a journey of collaboration and innovation, where your unique skills and talents will be valued and celebrated.

Together we will create a brighter future and make a meaningful difference.

As a Lead Data Engineer at JPMorganChase within the Commercial & Investment Bank, you are an integral part of an agile team that works to enhance, build, and deliver data collection, storage, access, and analytics solutions in a secure, stable, and scalable way.

As a core technical contributor, you are responsible for maintaining critical data pipelines and architectures across multiple technical areas within various business functions in support of the firm's business objectives.

Job responsibilities


* Design, build, and maintain scalable, high-performance data pipelines and infrastructure to support ingestion, processing, and storage of large volumes of scanned document images across enterprise-wide workflows


* Architect end-to-end data solutions on AWS cloud services to enable seamless flow of scanned images from source systems through OCR processing, model inference, and downstream data extraction and categorization pipelines


* Develop robust image preprocessing and OCR integration pipelines that handle TIF/PNG format conversion, normalization, resolution enhancement, noise reduction, and batching to prepare scanned documents for downstream computer vision and OCR models


* Build and optimize data pipelines that integrate OCR engine outputs, extracting structured text and metadata from scanned images and routing them into databases and analytics platforms for further processing


* Design and manage data storage architectures and containerized deployments, using Oracle databases and AWS-native stores (S3, EFS) to efficiently catalog, index, and retrieve extracted text, classification labels, and metadata from processed document images


* Drive the adoption of containerized deployment strategies using AWS EKS (Elastic Kubernetes Service) to deploy and scale image processing microservices, OCR engines, and data pipeline components with high availability and fault tolerance


* Collaborate closely with data scientists and ML engineers to ensure training datasets for different models, and other computer vision models are properly curated, versioned, labeled, and accessible through well-structured data pipelines


* Evaluate and integrate emerging data technologies and tools to continuously improve pipeline throughput, reduce processing latency for high-volume document scanning workloads, and optimize cost efficiency


* Establish and enforce data quality, lineage, governance, and security frameworks to ensure traceability and integrity of extracted data from scanned documents throughout the entire processing lifecycle


* Partner with security and compliance teams to ensure that scanned document data, extracted PII/PHI, and sensitive content are handled in accordance with regulatory requirements, ...




Share Job