Distributed Training & Performance Engineer - Vice President
Are you looking for an exciting opportunity to join a dynamic and growing team in a fast paced and challenging area? This is a unique opportunity for you to work with Global Technology Applied Research (GTAR) center at JPMorganChase.
The goal of GTAR is to design and conduct research across multiple frontier technologies, in order to enable novel discoveries and inventions, and to inform and develop next-generation solutions for the firm's clients and businesses.
As a senior-level engineer within The Global Technology Applied Research (GTAR), you will design, optimize, and scale large-model pretraining workloads across hyperscale accelerator clusters.
This role sits at the intersection of distributed systems, kernel-level performance engineering, and large-scale model training.
The ideal candidate can take a fixed hardware budget (accelerator type, node topology, interconnect, and cluster size) and design efficient, stable, and scalable training strategy, spanning parallelism layout, memory strategy, kernel optimization, and end-to-end system performance.
This is a hands-on role with direct impact on training throughput, efficiency, and cost at scale.
Job responsibilities
* Design and optimize distributed training strategies for large-scale models, including data, tensor, pipeline, context parallelism.
* Manage end-to-end training performance: from data input pipelines through model execution, communication, and checkpointing.
* Identify and eliminate performance bottlenecks using systematic profiling and performance modeling.
* Develop or optimize high-performance kernels using CUDA, Triton, or equivalent frameworks.
* Design and optimize distributed communication strategies to maximize overlap between computation and inter-node data movement.
* Design memory-efficient training configurations (caching, optimizer sharding, checkpoint strategies).
* Evaluate and optimize training on multiple accelerator platforms, including GPUs and non-GPU accelerators.
* Contribute towards incorporating performance improvements back to internal pipelines.
Required qualifications, capabilities, and skills
* Master's degree with 3+ years of industry experiences, or Ph.D.
degree with 1+ years of industry experience in computer science, physics, math, engineering or related fields.
* Engineering experience at top AI labs, HPC centers, chip vendors, or hyperscale ML infra teams.
* Strong experience designing and operating large-scale distributed training jobs across multinode accelerator clusters.
* Deep understanding of distributed parallelism strategies: data parallelism, tensor/model parallelism, pipeline parallelism, and memory/optimizer sharding.
* Proven ability to profile and optimize training performance using industry standard tools such as Nsight, PyTorch profiler, or equivalent.
* Hands-on experience with GPU programming and kernel optimization.
* Strong understanding of accel...
- Rate: Not Specified
- Location: New York, US-NY
- Type: Permanent
- Industry: Finance
- Recruiter: JPMorgan Chase Bank, N.A.
- Contact: Not Specified
- Email: to view click here
- Reference: 210709171
- Posted: 2026-02-08 07:08:14 -
- View all Jobs from JPMorgan Chase Bank, N.A.
More Jobs from JPMorgan Chase Bank, N.A.
- Optical Process Engineer
- Optical Supervisor - Annapolis Town Center
- Advisor - Preston Park Colonnade
- Store Leader - 9th & 9th
- Licensed Optical Manager - Promenade at Westlake
- Licensed Optician - Preston Park Colonnade
- Apprentice Optician - Perimeter Mall
- Aircraft Servicer
- Advisor - Shops at Farmington Valley
- Store Leader - Manhasset
- Assembler-Bundler
- Route Relief Utility Driver -
- Custodian/Janitor
- Route Relief Utility Driver
- Route Sales Representative
- Regional HR Business Partner
- Feeder-Folder
- Activities Assistant
- CNA
- MDS Coordinator RN