Research Engineer — Post-Training & Small Language Models (SLMs), Healthcare AI
Three hundred fifty million Americans rely on a healthcare system whose decision-making has become slow, costly, and adversarial - care delayed by prior authorization and paperwork, claims that misfire, clinical decisions made without the right information at the right moment, and patients who struggle to navigate or afford the care they need.
Deloitte has a new AI-first effort,, backed by $1B in committed investment, building the reasoning models and agentic systems to rebuild how that system decides - across payers, providers, and life sciences, and for the patients they serve - so that care is faster, fairer, and far less wasteful.
This is not AI applied at the margins.
It is a ground-up rebuild of the decision-making machinery behind American healthcare, at national scale.
This is resourced to do real post-training at scale - committed investment in GPU compute and training infrastructure, not toy fine-tunes.
As a Research Engineer on our post-training team, you will design, train, evaluate, and align the models that reason about healthcare - working across the full post-training lifecycle to shape model behavior for clinical and operational decisioning across the industry.
Healthcare decisioning is one of the cleanest verifiable-reward domains outside math and code: the problems are hard.
We ground that reward in real signals - clinical policy and criteria, adjudicated outcomes, and clinical-expert judgment - so correctness is checkable rather than asserted.
You will own the post-training stack for our clinical reasoning models end to end - from data and reward design through trained, evaluated models that ship.
This is not a prompt-engineering role.
We are looking for people who understand not just how to use LLMs, but how to improve and shape model behavior through advanced post-training.
You do not need a healthcare background.
We pair every engineer with clinical and domain experts and teach you the domain - you bring the modeling depth.
We hire on demonstrated depth, not years - the level you join at is determined through our interview process, based on the depth and judgment you demonstrate, not your years in a title.
Work you'll do
Post-training & alignment
• Design and execute post-training pipelines: supervised fine-tuning (SFT), preference optimization, and reinforcement learning / alignment workflows.
• Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability.
• Train reasoning models for healthcare decisioning using verifiable-reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomes.
Reward modeling & data
• Develop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performance.
• Curate, clean, synthesize, and evaluate large-sca...
- Rate: Not Specified
- Location: Gilbert, US-AZ
- Type: Permanent
- Industry: Management
- Recruiter: Deloitte
- Contact: Not Specified
- Email: to view click here
- Reference: 355692
- Posted: 2026-06-20 08:07:54 -
- View all Jobs from Deloitte
More Jobs from Deloitte
- Operations Manager
- Material Coordinator
- Cyber Security Manager: Incident Detection and Response
- Civil Foreman
- Customer Account Coordinator
- Mechanical Design Engineer
- Preconstruction Lead
- Millwright Helper
- Production Associate
- Customer Service Representative - Manufacturing Industry
- Manufacturing Manager
- Molding Technician
- Shipping Team Lead - Ontario, CA
- Natural Gas Scheduler
- Manager of Analysis
- Process Engineer
- Test Fixture Designer
- Test Engineer
- Strategic Accounts Manager
- Lab Testing Instrumentation Engineer