Research Engineer — Post-Training & Small Language Models (SLMs), Healthcare AI

Three hundred fifty million Americans rely on a healthcare system whose decision-making has become slow, costly, and adversarial - care delayed by prior authorization and paperwork, claims that misfire, clinical decisions made without the right information at the right moment, and patients who struggle to navigate or afford the care they need.

Deloitte has a new AI-first effort,, backed by $1B in committed investment, building the reasoning models and agentic systems to rebuild how that system decides - across payers, providers, and life sciences, and for the patients they serve - so that care is faster, fairer, and far less wasteful.

This is not AI applied at the margins.

It is a ground-up rebuild of the decision-making machinery behind American healthcare, at national scale.

This is resourced to do real post-training at scale - committed investment in GPU compute and training infrastructure, not toy fine-tunes.

As a Research Engineer on our post-training team, you will design, train, evaluate, and align the models that reason about healthcare - working across the full post-training lifecycle to shape model behavior for clinical and operational decisioning across the industry.

Healthcare decisioning is one of the cleanest verifiable-reward domains outside math and code: the problems are hard.

We ground that reward in real signals - clinical policy and criteria, adjudicated outcomes, and clinical-expert judgment - so correctness is checkable rather than asserted.

You will own the post-training stack for our clinical reasoning models end to end - from data and reward design through trained, evaluated models that ship.

This is not a prompt-engineering role.

We are looking for people who understand not just how to use LLMs, but how to improve and shape model behavior through advanced post-training.

You do not need a healthcare background.

We pair every engineer with clinical and domain experts and teach you the domain - you bring the modeling depth.

We hire on demonstrated depth, not years - the level you join at is determined through our interview process, based on the depth and judgment you demonstrate, not your years in a title.

Work you'll do

Post-training & alignment

• Design and execute post-training pipelines: supervised fine-tuning (SFT), preference optimization, and reinforcement learning / alignment workflows.

• Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability.

• Train reasoning models for healthcare decisioning using verifiable-reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomes.

Reward modeling & data

• Develop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performance.

• Curate, clean, synthesize, and evaluate large-sca...

Rate: Not Specified
Location: Gilbert, US-AZ
Type: Permanent
Industry: Management
Recruiter: Deloitte
Contact: Not Specified
Email: to view click here
Reference: 355692
Posted: 2026-06-20 08:07:54 -

View all Jobs from Deloitte

Share Job

Research Engineer — Post-Training & Small Language Models (SLMs), Healthcare AI

More Jobs from Deloitte