Deployed Improvement per Step: bf16=True vs bf16=False

Per-step model improvement. Higher is better.

0.6